Skip to content

Allow to remove forks from the analysis #109

@warenlg

Description

@warenlg

When analyzing sourced's codebase with the dashboard, huge repos that have been forked under src-d have made some visuals become irrelavant and even confuse the overview of the codebase/company's activity. For example,

languages_srcd

commits_srcd

Here, we include the src-d/or-tools huge C repo that somebody forked one day, in the analysis, and I think by default we should not. Also, another consequence is that, all the contributors of those forsks are going to be analyzed in the dashboard and thus be presented as the analyzed company's contributors whereas they never did one contribution to any the company's repos.

I try to filter them with gitbase one day in vain. So, to have a relevant analysis of a company, I fetch the data myself excluding forks with the GitHub API

#!/bin/bash

output_directory="/home/waren/tmp/"
repos=$(curl -s "https://api.github.com/orgs/src-d/repos?page=1&per_page=100" | jq '.[].url' | tr -d '"')
for repo in $repos; do
    is_fork=$(curl -s $repo | jq '.fork')
    if [ $is_fork == "false" ]; then
        html_url=$(curl -s $repo | jq '.html_url' | tr -d '"')
        git clone $html_url $output_directory
    fi
done

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions