Developer Deep Dive: dgit edition
We’re really proud of the work the team put into dgit (and taking home a prize in the Sia hack-a-thon) so we sat down (remotely, of course) with Brandon, dgit’s lead architect, to hear more about the tech and what makes dgit so special (with a bonus appearance from Wes, his partner in crime on the project).
Walk us through how you built dgit.
The core of dgit is actually a fairly simplistic architecture thanks to the awesome underlying tools it’s utilizing. If we take a step back from dgit, a git repo is nothing more than a collection of files, arranged in a DAG (Directed acyclic graph). Layered on top of the DAG is a tiny index for entering the DAG at a specific point based on a reference, aka branches.
To provide these two functions as a git remote, dgit builds upon two decentralized systems: Tupelo and sia. Tupelo is a DLT that is optimized for trust and ownership of objects (rather than currency). In dgit, Tupelo’s ChainTrees are a perfect fit for managing and providing trust of the repo’s index. That index of course is only valuable if you have your git objects accessible as well, enter Sia. Sia is a decentralized storage network that leverages blockchain technology to create a data storage marketplace that is more robust and more affordable than traditional cloud storage providers. dgit stores the entirety of the git repo’s DAG inside of sia, just like it would be on your local file system.
What problem(s) were you trying to solve?
Being in the blockchain ecosystem, we are always looking for apps that are a good fit for blockchain’s strengths in security and transparency. Moving from a centralized authority increases the security of your data - GitHub currently holds the keys to all our repos, and whether unintentionally or via a bad actor, could modify and cause harm with our repos.
Furthermore, a public blockchain provides transparency and auditing capabilities, not from some separate service or feed, but directly from the same authority presenting the repo. As we considered what a decentralized git remote looked like, we got really excited because git already builds upon distributed principles. We see a decentralized repo registry as the next evolution in the git ecosystem.
What are the use-cases for dgit that you think are most compelling?
In the short to medium term, I see a lot of value in having repos pushed to dgit for auditing purposes. I know at my previous organization, one of the concerns was malicious, covert modification of the code within our repos. With dgit, since each push can simultaneously go to the Tupelo network, you essentially have a public recorded, immutable changelog and checksum for your repo.
Why is blockchain/decentralization important here?
(Wes) In many ways the relationship between and co-evolution of git and GitHub is a fractal of the web itself. What started out as a decentralized, democratizing system has been effectively re-centralized by locking useful features up inside walled gardens where we become products sold to advertisers rather than users. Changing that is hard under any circumstances, but on top of that the vast majority of decentralized technologies are much more complex, slower, and unpredictably expensive compared to their traditional counterparts. Dgit is a great demonstration of how that need not be the case and that second-generation decentralized tools like Tupelo and Sia can alleviate the unnecessary complexity and cost of existing blockchain platforms.
Is an understanding of blockchain required?
Nope! Though if you’ve been in the blockchain community for awhile and you’ve used other DApps (decentralized apps), you’ll be blown away at how easy and fast dgit is.
Does someone’s project need to be on blockchain for a dev to get value from dgit?
Not at all, any ol’ repo works great on dgit. Your repo of cat gifs, your .dotfiles, your blockchain app, anything you would normally push up to a public GitHub repo works great on dgit.
What is dgit compatible with?
dgit hooks directly into git, so anything that runs git is compatible. Currently only 64-bit mac and linux binaries are published, but dgit is written in golang so any platforms/OS can build for their target source.
(Wes) As always Windows support is the tricky outlier here. None of us use it except for testing cross-platform code, and we haven’t gotten to that yet with dgit. But in theory it should be doable down the road if people are interested in it.
How was the experience of using Sia for storage?
(Wes) Using Sia was pretty straightforward. They had a Go library for uploading and downloading to/from Skynet already available. It was focused on using files directly so I had to modify it to accept/return io.Readers instead. But that’s what makes open source great, right? There’s an open pull request to bring that refactor into the library as of my writing this. But overall it was super simple to work with.
Decentralized tools can sometimes be challenging compared to their centralized counterparts, but Sia has done a great job with Skynet. When you upload you get a hash back and you can download your data later using that hash. It’s as simple as that. So I simply store those hashes in Tupelo ChainTrees with some metadata indicating that they are Sia Skylinks and then download them later when something needs those objects.
I also took advantage of Go’s CSP parallelization features to upload multiple objects in parallel after the initial serialized version was working correctly. There is a lot of additional room for performance optimizations in that part of dgit, so hopefully it will only get faster from here. The biggest one is probably uploading git packfiles directly to Skynet instead of parsing the individual objects out of them first. Though we will have to benchmark that against the parallelized object uploading to make sure it does actually speed things up.
What does the future of dgit include?
Our near term roadmap is focused on adding collaboration tools onto dgit since that’s really the whole purpose of having a git remote. Past that, we’ll be looking at what unique features can be added to dgit that are currently impossible given a centralized remote repository. Long term, a fully decentralized, browser based GitHub style UI sounds pretty great (and is totally possible with Tupelo).
What are you most excited about with dgit?
There are two facets here that really get me excited.
For dgit as a product itself, its roadmap is bright and full of promise. The nature of having a git repo on a blockchain like Tupelo unlocks a ton of possibilities that aren’t possible in a traditional SaaS model. One of those features I’m personally stoked about is the ability to program in bounties or rewards for open source contributions onto the repo itself, all secured and validated by the Tupelo blockchain.
The other facet that excites me is simply how this is a great display of how to sprinkle in blockchain for the value it provides and specifically how easy it is to do that on Tupelo. Since Tupelo separates trust from data, it makes it trivial to add in trust to existing data.
Ready to try it for yourself?