Monday, February 1, 2016

Re: Kind'a TL, but please DR - Need your thoughts

y So this is effectively a feed aggregation engine. I would recommend having a separate daemon running per media source, so that issues with one media source do not affect the operations of another.

 I never would have thought of this application as a feed aggregation engine, but I'm not really sure it fits the definition, will be digging deeper into this
 

It would be possible to do everything with one daemon, but would be much trickier to implement.

I agree 120%

 

A second(?) python daemon would be waiting for those messages to be in the DB, process them, act accordingly to the objective of the application, and update the DB as expected. This process(es) might included complicated and numerous mathematical calculations, which might take seconds and even minutes to process.


Implementation here is less critical than your workflow design.

I agree yet, this is the heart of my application. I understand it basically only involves the (web) application and the DBMS w/o any other external element; It is here where the whole shebang happens, but it might just be the DB application programmer in me though.

This could be implemented as a simple cron script on the host that runs every few minutes. The trick is to determine whether or not a) records have already been processed, b) certain records are currently processing, c) records are available that have yet to be processed/examined. You can use extra DB columns with the data to flag whether or not a process has already started examining that row, so any subsequent calls to look for new data can ignore those rows, even if the data hasn't finished processing. 

You gave me half my code there, but I'm not sure I want to trust a cron job for that. I know there are plenty of other options to do the dirty laundry here, such as queues, signals, sub-processes (and others?) but I kind'a feel comfortable leaving that communication exchange to the DBMS events as I see it; who would know better when 'something' happened but the DBMS itself?

The reason I want to do the application using Django is that all this HAS to have multiple web interfaces and, at the end of the day most media will c--ome through web, and have to be processed as http requests. Also, Django gives me a frame to make this work better organized and clean and I can make the application(s) DB agnostic.


 
What do you mean by 'multiple web interfaces'? You mean multiple daemons running on different listening ports? Different sites using the sites framework? End-user browser vs. API? 

A combination of all that and probably a bit more ... This is something I left out trying to evade the TL;DNR responses: I'm considering having this app return nothing but probably json or xml code for other applications to "feed" from it. (here is that feed word again!), there are a myriad of possible ways this application can be used. This, BTW, would leave all the HTML/CSS/Javascrpt/etc "problems" to someone else ... it might just be the DB app programmer in me trying to avoid dealing with web issues, or I might just be trying to make things harder for me; this is something I haven't really thought much about.


Wanting the application to be DB agnostic does not mean that I don't have a choice: I know I have many options to communicate among different python processes, but I prefer to leave that to the DBMS. Of the open source DBMS I know of, only Firebird and PostgreSQL have event that can provide the communication between all the processes involved. I was able to create a very similar application in 2012 with Firebird, but this time I am being restricted to PostgreSQL, which I don't to oppose at all. That application did not involve http requests.


Prefer to leave what to the DBMS? The DBMS is responsible for storing and indexing data, not process management. Some DBMS' may have some tricks to perform such tasks, but I wouldn't necessarily want to rely on them unless really necessary. If you're going to the trouble of writing separate listening daemons, then they can talk to whatever backend you choose with the right drivers.

I understand I'm having the DBMS do some of the process management, but it only goes as far as letting other processes know there is some job to be done, not even what needs to be done. I don't thing the overhead on the DBMS is going to be all that big.

This whole application is an idea that's been in my mind for some 7 years now. I even got as far as having a working prototype. I was just starting to learn Python then and my code is a shameful non pythonic mess. But it worked. I used Firebird as my RDMS, and all feeds (again?) would come in and out through an ad-hoc gmail account (with google voice for SMS messaging) I would get the input, process it and return the output within 10 to 40 seconds, with the average at around 20 which is satisfying if you consider the app is not really controlling the "medium". Of course, I never even considered any heaving testing as there were many limitations, the 500 outgoing messages per day being just the first one.  It just proved my concept. ande served as a very good (and long) exercise in Python.

I recently shared my thoughts with some close friends that linger around other branches of (IT related) knowledge and they liked they idea, hence the request for your input, for which I feel very much obliged.

Thanks a BUNCH!

============
DISCLAIMER!
============
I do not mean to argue any of the ideas you and all others have shared with me, on the contrary; you have fed even more my curiosity and curiosity well managed usually turns into knowledge. I can't do different from thanking all of you for that gift.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/8c8edd31-8576-4ea4-834c-d0063fe0ef34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment