Monday, November 24, 2014

Re: Obtaining content from Git


On Mon, Nov 24, 2014 at 5:06 PM, martin f krafft <madduck@madduck.net> wrote:
also sprach Russell Keith-Magee <russell@keith-magee.com> [2014-11-24 07:16 +0100]:
> The right place is in the view.
[…]
> If you want to do "interesting" things like retrieve a specific
> file version, you're not going to be able to use the storage API,
> because most raw

Interesting perspective you have there, which I can follow very
well. Thanks!

But let me dig a bit further, nevertheless:

> The storage layer is an abstraction to enable reusable apps. If
> I write a "user profile" app that needs to store an avatar, then
> I don't care where that avatar comes from, as long as it's
> available. It could be on a local filesystem; it could be on
> a cloud file store (like S3). The storage API lets me do that.

It could come from Git! After all, Git is really a database by
itself, just like the filesystem could be viewed as a database. So
somewhere in my idealist brain there's this theory that I could just
add a Git-storage-plugin to the storage layer and that would cause
content to be fetched from Git auto-magically if the
request-for-data by the view (not the HTTP request) matches certain
criteria and those data are available in Git. If not, then the
lookup goes to the next storage plugin.

So my view might call some sort of lookup() function, e.g.

  lookup_description(event_id)

which just returns a string. Ordinarily, this string comes from
pgsql at the moment. What I'd like to see is a way to override this
function so that instead the data are fetched from
  git.example.org/myrepo.git, branch=live, file=events/$event_id

Obviously, I could just as well call the Git-specific stuff straight
from the view, but that would inevitably tie the view to Git. Maybe
this is what I want, I am just trying to explain how I approached
this and what led me to ask the question the way I did.

Perhaps I wasn't clear. Of course an avatar could come from a git store. My point is that it's not a natural mapping. A file system is a mapping between a path name and a binary blob. Git has an additional layer - the path name, *plus* a historical version (referenced either by a hash, or a date). And you can't arbitrarily write to any point in a Git tree. Ok... you can if you're willing to have orphan nodes, but if you're thinking about version control in the traditional sense, a git repository can only be written to the "end" of history, but read from any point in it's history.

You might be able to write a mapping to Storage that did some sort of naming trick (e.g., split off a colon at the end of a filename for read, so you could request "/my/file.txt:deadbeef" to request the "deadbeaf" hash of /my/file.txt") - but my point is that the storage API doesn't contain a *natural* endpoint for version information. 

And, even if you did this, what you're going to get is a view of your Git repository that tries *really* hard to make it look like just a normal file system. All the interesting features will, by necessity of the interface, be buried. If you actually want to do anything interesting based on the fact that you're using a git store under the hood, you're better served working closer to the metal.

> If you're just looking to read the *current* value in a git
> repository, then just use normal filesystem storage over a normal
> git checkout.

Yeah, this might well be the best option, also in terms of
performance, unless I want to keep track of blob hashes to avoid
doing the whole branch→tree→blob lookup every time, which the
filesystem "caches" for me.

Do you know of a simple example that fetches data from the
filesystem? I am being lazy here as I am sure there's plenty, so
feel free to RTFM me! ;) However, maybe you know a very good example
that makes everything so clear and then I'd really appreciate this
over wading through the various means. I know Django has provisions
for serving static files, but somehow it seems like that's not what
I want… (since the files are not actually static and a given path
could return different data on subsequent calls)

A simple example that fetches data from the filesystem? Sure:

def myview(request):
    with open('filename.txt') as datafile:
        data = datafile.read()
    return render(request, 'my template.html', {'data': data})

You're possibly getting lost by thinking that this is a "Django" thing - it isn't. Basic Python is always an option. 

Furthermore, I'd actually like to post-process the data. The Git
repo would contain reStructuredText files and I'd like to render
them before filling the result into template slots.

This makes me think that there ought to be a cache involved at this
level. Sure, I could make a simple expiration-based cache, but I'd
really rather make the cache depend on the mtime of the underlying
source file on the filesystem (but include the post-processing step
in between)

Sure - so you can cache the result of the ReST rendering call. The result of any function call can be passed to Django's cache primitives, or the {% cache %} template tag can be used to do this at the template level. However, if you want to do this based on mtime - there are a bunch of things you can do with NginX and/or varnish to do *response* level caching; if you exploit the HTTP Last-Modified header and/or Etags:


you can get your browser and webserver to do the caching for you.
 
Have you seen something like this done? Is this also still best done
in a view?
 
Well... yes, I've seen this done before. It's called GitHub :-)
 
Yours
Russ Magee %-)

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAJxq84_skbwjfguCo%2BLaS4XrWdSLaYavBJXOKCE%3DhrzM5A-BTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment