Git as an SSH Application

See the overview for more about why and where this is headed

When setting up auth for shared git servers long ago (before GitHub was a thing), commonly I’d use user accounts on a linux server that we used for git hosting. Those users would have public keys in their own authorized_keys files in the .ssh directory of their home directory. If it was a shared repo we place it in a path that all users could access and that path would be part of the git remote path.

This worked fine for a few users collaborating on a shared server, but wouldn’t be a fit for a large hosted git service. First, it wouldn’t make sense for every user to have a system account. There would be a lot of overhead of all those system accounts, and I’m sure would break with scale of GitHub. Second, we want be able to run git over ssh on multiple servers, so we can scale, load balance, and replace/deploy as needed. That means we dont want all of ssh servers to need the same system user accounts or all their public ssh keys. We should be able to look up users and keys from a central database or auth service.

How does git over ssh work?

When pushing or pulling for a repo over ssh what happens? The git cli on your local machine invokes ssh to make secure connections to the remote server. It has ssh run a command on the remote machine to receive that data that the local client is sending. Both sides have the same commands and push and pull just define which direction to send or receive data. The naming of the internal commands gets a bit confusing, as even the docs mention. But mainly we are talking about git-receive-pack and git-upload-pack. The communications between the 2 sides talks place over ssh piping output and input between them.

Before we get to the sending and receiving of data over ssh, we need authenticate the user and authorize their actions for that repo.

GitHub, and most large hosted git services, use a single system user for ssh. Commonly named git but could be named anything. We need ever user that is connecting with their own public key over ssh to be able to authenticate as this git. First option the worlds longest authorize_keys file, but as we mentioned above that isn’t gonna scale or fit into the deployment process of this service. Instead in openssh you can configure an AuthorizedKeysCommand. Openssh invokes this command with arguments that you specify, but can include among other things the user name and the fingerprint of the key used. The user name wont do us any good because its always gonna be the same git but the fingerprint of the public key would let us lookup the application user who owns that key. Openssh expects the AuthorizedKeysCommand to write to stdout the contents of an authorized_keys file, just dynamically based on the arguments we passed to it.

In the sshd_config we set the AuthorizedKeyCommand

AuthorizedKeysCommand /usr/local/bin/ultragist keys authorizedkeys --fingerprint=%f --dbpath=/data/ultragist.db

As we get some additional config, will probably move that all to a config file, rather than setting dbpath but for now that is the path to sqlite db.

In Go, we do a lookup for that fingerprint in a public keys table in our sqlite db to find the user. If we dont find the user, we’ll return an error and wont write authorized keys, so openssh will deny the connection.

var defaultOptions = []string{
    "restrict",
}

func AuthorizedKeys(fingerprint string) error {
    key, err := GetKeyByFingerprint(fingerprint)
    if err != nil {
        return err
    }
    var options []string
    copy(defaultOptions, options)
    options = append(options, fmt.Sprintf("environment=\"UGUSER=%s\"", key.UserId))
    fmt.Print(string(MarshalAuthorizedKey(key.pk, options)))
    return nil
}

If we do find the user, we’ll output their full public key and some ssh options, like restrict which will limit what they capabilities they have access to over ssh. Also setting their application username in an environment variable via openssh options, which lets us associate future commands in this session with the user we just authorized.

Ok now we just let a user log in a git on the server so they can run git commands, how do we limit to only running git commands, and only for repos they have access to? git-shell

Using git-shell

git-shell is included with git, as a very limited shell that can be used to let users over ssh only access to invoke git commands for syncing data. To force the git system user to use git-shell we can set that when creating the system user account. From the Dockerfile:

RUN useradd -p "*" --home=/data/gists/ -s /usr/local/bin/gist-shell git

We set the user home directory to a path that docker will mount on a persistent volume on the VM this eventually runs on. It also sets the shell to gist-shell While git-shell does limit the commands we would allow the git system user to run, it cant enforce application level authorization for access to gists the user owns (read/write) or are public (read only).

So using similar setup, gist-shell limits the commands that can be run and adds a basic level of authorization, for now just checking that the user we authorized is the same user in the path. We can add more functionality to that, looking up authorization from db later.

func GistShell(command string) error {
    allowed, parsedCommand, args := GistShellParse(command)
    if !allowed {
        return fmt.Errorf("command not allowed: %s", command)
    }
    userId, ok := os.LookupEnv("UGUSER")
    if !ok {
        return fmt.Errorf("no user id")
    }
    ok, err := GistShellAuthorization(userId, args)
    if !ok {
        return err
    }
    cmd := exec.Command(parsedCommand, args)
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    cmd.Stdin = os.Stdin
    err = cmd.Start()
    if err != nil {
        return err
    }
    err = cmd.Wait()
    if err != nil {
        return err
    }
    return nil
}

We parse the command sent from git client (or any client connected over ssh) to ensure its limits to a small set of commands we allow. We parse the path to the repo, for now relying on the username from the session we authorized already matching the user part of the path to the gist repo. We probably move away from username directly to some userid that is ensured to be consistent even if we change usernames, but this is just a proof of concept right now.

Now we set a public key + user name in the db, and push to a repo on the remote server. It checks the public key fingerprint to resolve the user, and limits that commands that can be run, and which repos it can access based on that user.

What’s next

A work in progress of this and some more I haven’t written up yet, is on GitHub. Next part is probably a basic web server so we can view gists on the web. I’m looking at using git2go, which is libgit2 bindings for Go.

References