Archive

Posts Tagged ‘json’

On sqlalchemy – what do you put in your base model?

My previous post regarded the reflective use of sqlalchemy, and there I showed how to set up a codebase for BaseModel and schema mapper. I put only one method in my BaseModel, and that was actually the property session, which returns the session that holds the instance (None if the instance is not added).

Now, before I continue, just a short description of this test database. The director table holds the movie directors, lastname is required and unique. The movie table holds the movies, movie field is required and unique, director_id is the link to the director (can be null, for unknown/not-added director), and year – also optional. The genre table holds just the genre names, and the movie_genre is only association table. Before you continue, you should check the mappers we added in the previous post.

There is an easy way to view all the information packed in a list with a coupled query, but we’ll do it the sqlalchemy’s way, that is, we’ll query it with its orm.

from sqlalchemy.orm import eagerload_all
movies=session.query(Movie)\
...............options(eagerload_all(Movie.director, Movie.genres))\
...............limit(10).all()

Notice that we added an option called eagerload_all that specifies which mapped properties of the queried objects should be loaded on the fly into the sessions – the so-called eagerloading. Use this if and only if you are certain that you will need this properties. We added limit of 10 only for clarity. If I print these objects like this:

for m in movies:
....print "\"%s\" (%s) by %s %s (%s)" \
..........% (m.movie, m.year, m.director.firstname, m.director.lastname,
..........", ".join([g.genre for g in m.genres]))

I get my sample database:

"A Clockwork Orange" (1971) by Stanley Kubrick (Sci-fi)
"Dr. Strangelove" (None) by Stanley Kubrick (Comedy)
"2001: A Space Oddisey" (1969) by Stanley Kubrick (Post-modern, Sci-fi)
"Lolita" (None) by Stanley Kubrick ()
"Shadow of a Doubt" (None) by Alfred Hitchcock (Noir, Mystery)
"Vertigo" (1957) by Alfred Hitchcock (Mystery)
"Citizen Kane" (1941) by Orson Welles ()
"Europa" (1991) by Lars von Trier ()
"Annie Hall" (None) by Woody Allen (Comedy)
"Element of Crime" (1984) by Lars von Trier ()

Now that we have a test data, I’ll continue with my BaseModel.

There is always the question of what kind of foreign key we should use. We can agree that incremental unsigned integers for every table is a good way of efficiently linking the tables, removing the complexity of using composite foreign keys or keys with slow look-up, but sometimes people tend to put these id-s into the application. Surely, the director with id 4 has bears no information, but when you say director ‘Hitchcock’ you get a meaningful data in your application. So, my approach is – keep the id-s in your database and out of the application. Use ‘natural’ key to refer to records from the application.

If I proceed in this manner, it would be a good thing to have a method load() associated to each model that can return a unique record/object (if exists) given a set of arguments. Retrieving unique records with sqlalchemy is very easy, sqlalchemy’s quering mechanism has a method one() that in the background makes a LIMIT 2: if there are two records found – throws MultipleResultsFound; if there are no results – throws NoResultsFound; and if it finds one – it returns it. Here’s how:

d = session.query(Director)\
............filter(Director.lastname=='Hitchcock').one()
print d.firstname

One genuine feature of the python language are the class methods, which implicitely take the class as the first parameter. So, instead of doing this:

class Director(BaseModel):
....@staticmethod
....def load(session, lastname):
........return session.query(Director)\
.......................filter(Director.lastname==lastname)\
.......................one()

which we ought to do for every model, we can put something more generic in the BaseModel:

....@classmethod
....def load(cls, session, **kwargs):
........q = session.query(cls)
........filters = [getattr(cls, field_name)==kwargs[field_name] \
...................for field_name in kwargs]
........return q.filter(and_(*filters)).one()

With this – we get static load method for all of the models! Sorry, but I don’t think you can do a similar thing in some other language – please note me if one can. The tricky part in this method is the list comprehension, but it is quite obvious, we just ‘compile’ a list of filters for every named argument. For example, for the Movie model, the call:

Movie.load(session, movie='Europa', year=1991)

translates the list comprehension into:

[Movie.movie=='Europa', Movie.year==1991]

Later, we pass the list as an argument-list to and_() (from sqlalchemy.sql.expression) which compiles filter clause of these two filters. The cls argument is actually the Movie class/model, which is implicitly passed when the method is called.

In fact, a very basic method is a class method. Actually, this is also one of the python’s unique features: the __new__ method. This one returns the object if it is not overridden, and if it is, it may not return the object, it may return some specific object, or the default/desired object. If it returns some object, the __init__ method will be called upon the newly created object. This method is actually quite good at constructing the singleton pattern. No one will try to put a BaseModel instance into a session, but anyway, we’ll forbid it:

def __new__(cls):
....if cls==BaseModel:
........raise TypeError, "You can not instantiate a BaseModel"
....return super(BaseModel, cls).__new__(cls)

So, we raise a TypeError if we try to create a BaseModel instance, and if it is an instance from a child class, we simply return such instance.

For deleting a single object (using the session), sqlalchemy provides:

session.delete(obj)

We would want to provide class access for deletion, simply:

def drop(self):
....self.session.delete(self)

where one can see the usefulness of having a session property from the instance. To use:

d = Director.load(session, lastname='Hitchcock')
d.drop()

My next item in the BaseModel is a serializer. I’ll make a json serialize method. I simply will serialize the dictionary of the object without the
_sa_instance_state attribute which can not be encoded. Simply:

def serialize(self):
....tmp = self.__dict__.copy()
....del tmp['_sa_instance_state']
....return simplejson.dumps(tmp)

Finally, I needed a specific method for one model in a project, and it found its way to the BaseModel. Namely, I want to what is the position of a record in a table when I sort by a particular column, and it is actually the count of records which satisfy the condition that their values are smaller than the value of the particular record with regard of the sorting column. I can put this in my BaseModel:

def position(self, order):
....cls = self.__class__
....return self.session.query(cls)\
................filter(getattr(cls, order)<getattr(self, order))\
................count()

The __class__ property of the instance simply returns the particular model.

The next post on sqlalchemy will probably have some methods and properties in the specific models – all done with sqlalchemy’s orm querying – just some stuff I had to leard with trial and error. But I think my next post will deal with pyparsing again, and I will parse in a different way than that time.