GitHub migration guide

This guide will walk you through the process of migrating from Port's existing GitHub Cloud App to our new and improved GitHub Integration, which is powered by Ocean.

Improvements

The new Ocean-powered GitHub integration comes with several key improvements:

More authentication options - You can now connect the integration using a Personal Access Token (PAT) that you control, giving you more flexibility.
Enhanced performance - Faster resyncs thanks to improved API efficiency, making your data available sooner.
Better selectors - More granular control over what you sync with improved selectors for pull requests, issues, dependabot alerts, codescanning alerts, files, and folders.

Multi-organization support

The GitHub integration supports ingesting data from multiple GitHub organizations starting from version 3.0.0-beta. Configure one organization using githubOrganization, in your environment variables or list multiple organizations in your port mapping under organizations:

deleteDependentEntities: true
createMissingRelatedEntities: true
enableMergeEntity: true
organizations:
  - org1
  - org2
# ... rest of your mapping (repositoryType, resources, etc.) ...

Precedence: If githubOrganization is set in the environment variables or config and organizations are listed in port mapping, the integration syncs only the githubOrganization (single‑org behavior).

Authentication model

Personal access token (PAT)

You can now authenticate with our GitHub integration using a Personal Access Token (PAT) instead of a GitHub App. This gives you more control over the integration's permissions. For more details, see the installation page.

Classic PAT required for multi-org

For multi-organization support, you must use a classic Personal Access Token. Fine-grained PAT tokens do not work with multi-organization configurations.

Below is a sample Helm value for this configuration:

integration:
  secrets:
    githubToken: "<GITHUB_PAT>"

GitHub App

If you prefer using a GitHub App, you can still authenticate with our Ocean-powered GitHub integration. You will need to create the app yourself, which is a process similar to our existing self-hosted app installation. This process is documented here.

Single organization limitation

GitHub App authentication only supports one organization at a time. You must specify exactly one organization using githubOrganization.

Below is a sample Helm value for this configuration:

integration:
  config:
    githubAppId: "<GITHUB_APP_ID>" # app client id also works
    githubOrganization: "my-org"  # Required for single organization support regardlass of token type
  secrets:
    githubAppPrivateKey: "<BASE_64_ENCODED_PRIVATEKEY>"

Webhooks

The integration now automatically configures webhooks on GitHub to receive live events. To enable this, you must grant your PAT permission to create organization webhooks. Additionally, you need to provide a public URL for the integration. You can do this by setting liveEvents.baseUrl when deploying with Helm or ArgoCD, or by setting the OCEAN__BASE_URL environment variable when running the integration as a Docker container. For more details, please refer to the live events documentation.

Deployment

We've expanded our self-hosted installation examples to support deploying on a Kubernetes cluster using Helm or ArgoCD. For more details, please refer to the deployment documentation.

Workflow runs

We have increased the number of workflow runs ingested for any given workflow in a repository. The new integration now fetches up to 100 of the latest workflow runs, up from the previous limit of 10 per repository.

Repository type

You can now specify the type of repositories (public, private, or all) from which to ingest data. All other data kinds that are associated with repositories (like pull requests, issues, etc.) will only be fetched from repositories that match this configuration.

repositoryType: 'all' # ✅ fetch pull requests from all repositories. can also be "private", "public", etc
resources:
  - kind: pull-request
    selector:
      # ...

Kind mapping changes

The data blueprints for GitHub have been updated to provide cleaner data structures and improved relationships between different software catalog entities. Understanding these changes is crucial for a smooth migration.

A key change is how we denote custom attributes. We now add a double underscore prefix (e.g., __repository) to attributes that Port adds to the raw API response from GitHub. This makes it clear which fields are part of the original data and which are enrichments from the integration.

Files & GitOps

Organization field in file selectors

The organization field is optional when githubOrganization is set in the environment variables and it is required when not provided there.

Existing configuration (click to expand)

resources:
  - kind: file
    selector:
      query: 'true'
      files:
        # Note that glob patterns are supported, so you can use wildcards to match multiple files
        - path: '**/package.json'
        # The `repos` key can be used to filter the repositories from which the files will be fetched
          repos:
            - "MyRepo" # ❌ changed
    port:
      entity:
        mappings:
          identifier: .file.path # ❌ Changed
          title: .file.name
          blueprint: '"manifest"'
          properties:
            project_name: .file.content.name
            project_version: .file.content.version
            license: .file.content.license

New configuration (click to expand)

resources:
  - kind: file
    selector:
      query: 'true'
      files:
          # Note that glob patterns are supported, so you can use wildcards to match multiple files
        - path: '**/package.json'
          organization: my-org # Optional if githubOrganization is set; required if not set
            # The `repos` key can be used to filter the repositories and branch where files should be fetched
          repos:
            - name: MyRepo # ✅  new key:value pairs rather than a string.
              branch: main # ✅  new optional branch name for each specified repository
            - name: MyOtherRepo
    port:
      entity:
        mappings:
          identifier: .path
          title: .name
          blueprint: '"manifest"'
          properties:
            project_name: .content.name
            project_version: .content.version
            license: .content.license

Here are the key changes for file mappings:

The organization field can be specified per file pattern when no global organization is configured.
The repos selector is now a list of objects, where each object can specify the repository name and an optional branch. This provides more granular control over which files are fetched.
File attributes are no longer nested under a file key. They are now at the top level of the data structure. For example, instead of .file.path, you should now use .path.
The repo key has been renamed to repository when referencing the repository a file belongs to, for consistency with other data kinds.

Repository relationships

Fetching related data for a repository, like teams and collaborators, is now managed through a unified include selector. This replaces the previous method of using separate boolean flags for each data type, offering a more consistent and streamlined approach.

Repository and teams

Existing configuration (click to expand)

resources:
- kind: repository
  selector:
    query: "true" # JQ boolean query. If evaluated to false - skip syncing the object.
    teams: true # ❌ changed
  port:
    entity:
      mappings:
        identifier: .name
        title: .name
        blueprint: '"githubRepository"'
        properties:
          readme: file://README.md
          url: .html_url
          defaultBranch: .default_branch
        relations:
          githubTeams: "[.teams[].id | tostring]" # ❌ changed

New configuration (click to expand)

resources:
- kind: repository
  selector:
    query: "true"
    include: ["teams"] # ✅ new
  port:
    entity:
      mappings:
        identifier: .name
        title: .name
        blueprint: '"githubRepository"'
        properties:
          readme: file://README.md
          url: .html_url
          defaultBranch: .default_branch
        relations:
          githubTeams: '[.__teams[].id | tostring]' # ✅ new

Repository and collaborators

Existing configuration (click to expand)

resources:
- kind: repository
  selector:
    query: "true"
    collaborators: true # ❌ changed
  port:
    entity:
      mappings:
        identifier: .name
        title: .name
        blueprint: '"githubRepository"'
        properties:
          readme: file://README.md
          url: .html_url
          defaultBranch: .default_branch
        relations:
          collaborators: "[.collaborators[].login]" # ❌ changed

New configuration (click to expand)

resources:
- kind: repository
  selector:
    query: "true"
    include: ["collaborators"] # ✅ new
  port:
    entity:
      mappings:
        identifier: .name
        title: .name
        blueprint: '"githubRepository"'
        properties:
          readme: file://README.md
          url: .html_url
          defaultBranch: .default_branch
        relations:
          collaborators: '[.__collaborators[].login]' # ✅ new

Issues

We've introduced a new state selector. This allows you to filter which objects are ingested based on their state (e.g., open, closed).

Existing configuration (click to expand)

resources:
  - kind: issue
    selector:
      query: ".pull_request == null" # JQ boolean query. If evaluated to false - skip syncing the object.
    port:
      entity:
        mappings:
          identifier: ".repo + (.id|tostring)"
          title: ".title"
          blueprint: '"githubIssue"'
          properties:
            creator: ".user.login"
            assignees: "[.assignees[].login]"
            labels: "[.labels[].name]"
            status: ".state"
            createdAt: ".created_at"
          relations:
            repository: ".repo" # ❌  changed

New configuration (click to expand)

resources:
  - kind: issue
    selector:
      query: ".pull_request == null" # JQ boolean query. If evaluated to false - skip syncing the object.
      state: "closed" # ✅  new
    port:
      entity:
        mappings:
          identifier: ".__repository + (.id|tostring)"
          title: ".title"
          blueprint: '"githubIssue"'
          properties:
            creator: ".user.login"
            assignees: "[.assignees[].login]"
            labels: "[.labels[].name]"
            status: ".state"
            createdAt: ".created_at"
            closedAt: ".closed_at"
            updatedAt: ".updated_at"
            description: ".body"
            issueNumber: ".number"
            link: ".html_url"
          relations:
            repository: ".__repository" # ✅  new, uses leading underscore to indicate custom enrichment.

Pull requests

We've introduced new selectors to give you more control over which pull requests are ingested. The states selector allows you to filter pull requests by their state (e.g., open, closed). Additionally, you can use maxResults to limit the number of closed pull requests fetched and since to fetch pull requests created within a specific time period (in days).

Existing configuration (click to expand)

resources:
  - kind: pull-request
    selector:
      query: "true" # JQ boolean query. If evaluated to false - skip syncing the object.
    port:
      entity:
        mappings:
          identifier: ".head.repo.name + (.id|tostring)" # The Entity identifier will be the repository name + the pull request ID.
          title: ".title"
          blueprint: '"githubPullRequest"'
          properties:
            creator: ".user.login"
            assignees: "[.assignees[].login]"
            reviewers: "[.requested_reviewers[].login]"
            status: ".status" # merged, closed, opened
            closedAt: ".closed_at"
            updatedAt: ".updated_at"
            mergedAt: ".merged_at"
            createdAt: ".created_at"
          relations:
            repository: .head.repo.name

New configuration (click to expand)

resources:
  - kind: pull-request
    selector:
      query: "true" # JQ boolean query. If evaluated to false - skip syncing the object.
      states: ["open"] # ✅ new
      maxResults: 50 # ✅ new, limit closed PRs to 50 capped at 300
      since: 60  # ✅ new, fetch closed PRs within 60 days capped at 90 days
    port:
      entity:
        mappings:
          identifier: ".head.repo.name + (.id|tostring)" # The Entity identifier will be the repository name + the pull request ID.
          title: ".title"
          blueprint: '"githubPullRequest"'
          properties:
            creator: ".user.login"
            assignees: "[.assignees[].login]"
            reviewers: "[.requested_reviewers[].login]"
            status: ".state" # merged, closed, opened
            closedAt: ".closed_at"
            updatedAt: ".updated_at"
            mergedAt: ".merged_at"
            createdAt: ".created_at"
            prNumber: ".id"
          relations:
            repository: .__repository #  ✅ new, it is now obvious when an attribute is added to the raw API response by the integration.

Folders

Organization field in folder selectors

The organization field is optional when githubOrganization is set in the environment variables and is required when not provided (e.g., Classic PAT with multiple organizations defined in your port mapping).

For the folder kind, the folder.name attribute is no longer part of the response. Instead, you can easily derive the folder name from the folder.path using a JQ expression, as shown in the example below:

Existing configuration (click to expand)

resources:
- kind: folder
  selector:
    query: "true"
    folders: 
      - path: apps/*
        repos:
          - backend-service # ❌  changed
  port:
    entity:
      mappings:
        identifier: ".folder.name" # ❌  changed
        title: ".folder.name" # ❌  changed
        blueprint: '"githubRepository"'
        properties:
          url: .repo.html_url + "/tree/" + .repo.default_branch  + "/" + .folder.path # ❌  changed
          readme: file://README.md

New configuration (click to expand)

resources:
- kind: folder
  selector:
    query: "true"
    folders: 
      - path: apps/*
        organization: my-org # Optional if githubOrganization is set; required if not set
        repos:
          - name: backend-service # ✅  new, now has a 'name' key
            branch: main # ✅  new, optional branch name
  port:
    entity:
      mappings:
        identifier: .folder.path | split('/') | last # ✅  new, derived using JQ
        title: .folder.path | split('/') | last
        blueprint: '"githubRepository"'
        properties:
          url: .__repository.html_url + "/tree/" + .__repository.default_branch  + "/" + .folder.path # ✅  new, repository is a custom enrichment
          readme: file://README.md

Teams

To improve performance when fetching team members, we now use GitHub's GraphQL API instead of the REST API.

This change has two main consequences:

The ID for a team may differ depending on whether you are fetching its members. This is due to differences between GitHub's REST and GraphQL APIs.
Team members are now located in a nodes subarray within the team object.

Existing configuration (click to expand)

- kind: team
  selector:
    query: 'true'
    members: true # ✅  unchanged
  port:
    entity:
      mappings:
        identifier: .id | tostring
        title: .name
        blueprint: '"githubTeam"'
        properties:
          slug: .slug
          description: .description
          link: .url
        relations:
          team_member: '[.members[].login]' # ❌  changed

New configuration (click to expand)

- kind: team
  selector:
    query: 'true'
    members: true # ✅  unchanged
  port:
    entity:
      mappings:
        identifier: .id # toString is not neccesary, graphql id is a string
        title: .name
        blueprint: '"githubTeam"'
        properties:
          slug: .slug
          description: .description
          link: .url
        relations:
          team_member: '[.members.nodes[].login]' # ✅  new, nodes subarray

Other changes

`dependabot-alert`

The dependabot-alert kind now supports a states selector. This allows you to specify an array of states (e.g., open, fixed) to control which alerts are ingested:

resources:
  - kind: dependabot-alert
    selector:
      query: "true"
      states: # ✅  new
        - "open"
        - "fixed"

`code-scanning-alerts`

The code-scanning-alerts kind now supports a state selector. This allows you to specify a single state (e.g., open) to control which alerts are ingested:

resources:
  - kind: code-scanning-alerts
    selector:
      query: "true"
      state: open # ✅  new

Summary of key changes

This section provides a high-level summary of the key changes for mappings.

Area	Old Value	New Value	Notes
Multi-Organization	N/A	`githubOrganization` is not optional	Classic PAT supports multiple orgs using the `organization` parameter in port mapping; GitHub App and Fine-grained PAT do not support multi organization and there required the `githubOrganization` configuration. Syncing multiple organizations increases API calls and may slow down the integration.
File Organization	N/A	`organization: "my-org"`	Optional if `githubOrganization` is set; required when not (e.g., Classic PAT multi-org).
Folder Organization	N/A	`organization: "my-org"`	Optional if `githubOrganization` is set; required when not set(e.g., Classic PAT multi-org).
Authentication	GitHub App Installation	PAT or Self-Created GitHub App	The integration can be authenticated using a Personal Access Token (PAT) or a self-created GitHub App. Multi-org requires classic PAT.
Webhooks	App Webhook	Automatic Setup by Integration	The integration now manages its own webhooks for live events. This requires `webhook` permissions and `liveEvents.baseUrl` to be set.
Workflow Runs	10 per repository	100 per workflow	The number of ingested workflow runs has been increased.
Repository Type	N/A	`repositoryType` configuration	A new top-level configuration is available to filter repositories by type (`public`, `private`, or `all`).
Repository Relationships	`teams: true`, `collaborators: true`	`include: "teams"`, `include: "collaborators"`	The `include` selector replaces boolean flags for fetching related data. The fetched data is also now prefixed with `__` (e.g., `.__teams`).
Pull Requests	N/A	`states`, `maxResults`, `since` selectors	New selectors are available for more granular filtering.
File properties	`.file.path`	`.path`	All file properties are now at the top level of the object, no longer nested under `.file`.
Repository reference	`.repo` or `.head.repo.name`	`.__repository`	The integration now consistently provides repository information under the `__repository` field for all relevant kinds.
Folder name	`.folder.name`	`.folder.path \| split('/') \| last`	The folder name is no longer directly available and should be derived from the folder path using a JQ expression.

Improvements​

Multi-organization support​

Authentication model​

Personal access token (PAT)​

GitHub App​

Webhooks​

Deployment​

Workflow runs​

Repository type​

Kind mapping changes​

Files & GitOps​

Repository relationships​

Repository and teams​

Repository and collaborators​

Issues​

Pull requests​

Folders​

Teams​

Other changes​

dependabot-alert​

code-scanning-alerts​

Summary of key changes​

Improvements

Multi-organization support

Authentication model

Personal access token (PAT)

GitHub App

Webhooks

Deployment

Workflow runs

Repository type

Kind mapping changes

Files & GitOps

Repository relationships

Repository and teams

Repository and collaborators

Issues

Pull requests

Folders

Teams

Other changes

`dependabot-alert`

`code-scanning-alerts`

Summary of key changes