<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Almost a program...</title>
	<atom:link href="http://petrushev.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://petrushev.wordpress.com</link>
	<description>random thoughts on programming</description>
	<lastBuildDate>Tue, 11 Jan 2011 09:22:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='petrushev.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Almost a program...</title>
		<link>http://petrushev.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://petrushev.wordpress.com/osd.xml" title="Almost a program..." />
	<atom:link rel='hub' href='http://petrushev.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Casual mysql fail &#8211; careful with that group_concat</title>
		<link>http://petrushev.wordpress.com/2010/10/04/casual-mysql-fail-group_concat/</link>
		<comments>http://petrushev.wordpress.com/2010/10/04/casual-mysql-fail-group_concat/#comments</comments>
		<pubDate>Mon, 04 Oct 2010 09:08:26 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[rdbms]]></category>
		<category><![CDATA[reliability]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=90</guid>
		<description><![CDATA[I was doing a minor maintenance job on a non-normalized table which required large grouping and gathering of id&#8217;s, which in turn were to be processed by other routine. Each group would returned a comma-separated list using mysql&#8217;s group_concat. Yet, some groups failed to do so, resulting with strings that ended with a comma (this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=90&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I was doing a minor maintenance job on a non-normalized table which required large grouping and gathering of id&#8217;s, which in turn were to be processed by other routine. Each group would returned a comma-separated list using mysql&#8217;s group_concat. Yet, some groups failed to do so, resulting with strings that ended with a comma (this was a bug that was very hard to trace, actually).</p>
<p>As it turns out, the <a href="http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_group_concat_max_len" target="_blank">group_concat resulting string has a limit</a>! Now, let&#8217;s say for a moment that this was acceptable RDBMS design (it is not!) &#8211; the server does not throw exception on exceeding this limit, nor does it give any warning &#8211; which was the reason for some tough debugging. It simply truncated the result in a shorter list!</p>
<p>That is plain wrong!</p>
<p>In no single imaginable scenario an aggregate function is allowed to return false result. I really don&#8217;t care if you swap-up my server or wait an hour &#8211; the most important thing is that my query gets the correct data. No optimization is ever more valuable than reliability and getting the right data.</p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/mysql/'>mysql</a>, <a href='http://petrushev.wordpress.com/tag/rdbms/'>rdbms</a>, <a href='http://petrushev.wordpress.com/tag/reliability/'>reliability</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/90/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/90/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=90&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/10/04/casual-mysql-fail-group_concat/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
		<item>
		<title>Try to use the relational database to its full potential</title>
		<link>http://petrushev.wordpress.com/2010/09/24/rdbms-over-search-engines/</link>
		<comments>http://petrushev.wordpress.com/2010/09/24/rdbms-over-search-engines/#comments</comments>
		<pubDate>Fri, 24 Sep 2010 11:42:15 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[rdbms]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[sphinxsearch]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=83</guid>
		<description><![CDATA[or: How the search engines did not do for me what an RDBMS did. It will be a quick post however. For my latest project I have highly relational schema for a database on a PostgreSQL server. The search API steadily grew more complicated and I finally had the chance to install and utilize the Apache Solr search engine. To [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=83&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>or:</p>
<h3>How the search engines did not do for me what an RDBMS did.</h3>
<p>It will be a quick post however.</p>
<p>For my latest project I have highly relational schema for a database on a <a href="http://www.postgresql.org/" target="_blank">PostgreSQL </a>server. The search API steadily grew more complicated and I finally had the chance to install and utilize the <a href="http://lucene.apache.org/solr/" target="_blank">Apache Solr</a> search engine. To simplify matters for this post, let&#8217;s say that for each document I had a short title, a category, two full-text fields, and two different sets of tags. I had a pretty good idea how to set up the solr, its data-import-handler, how the delta import will work, how the schema will be designed, even a faceting strategy for my web interface. I did a lot of work for some of this, and then I got stuck.</p>
<p>You can not query the solr fields with: *text*.</p>
<p>This was a fail. It burned my time. I understand why the solr people didn&#8217;t implement such thing. As a user &#8211; I would not want to search for part of word inside words, however, this was not only something the client insisted on, the nature of the application demanded such lookups.  I had to have *text* queries.</p>
<p>A colleague of mine heavily insists on <a href="http://www.sphinxsearch.com/" target="_blank">sphinxsearch</a>. I am aware it has good performances, and you can also work-out different sorts of matching, *text* included. But very soon I gave up &#8211; it had a very bad support for delta imports.</p>
<p>Then I decided to use a brute force. I created an algorithm that put the multivalues field in one field and a method for querying it, and then put all the documents to be indexed in one big postgresql table, one row per document. Made a cron job for delta-updates on documents, and switch the search API from the complex joins to simple query of the big table. Of course, I made a whole bunch of different simple and composite indices on the big table for this occasion <em>after</em> the initial population.</p>
<p>One can not expect the same speed of querying as with solr, but the speed is satisfactory. And, most importantly, I retained the possibility to query what I like, and update in optimized manner. Finally, I suppose the point of this task was that you should not rush things and use more technologies in your product, just try to use tools you already have, in this case your RDBMS. Sometimes even the most complicated problems have very simple solutions.</p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/postgresql/'>postgresql</a>, <a href='http://petrushev.wordpress.com/tag/rdbms/'>rdbms</a>, <a href='http://petrushev.wordpress.com/tag/search-engine/'>search engine</a>, <a href='http://petrushev.wordpress.com/tag/solr/'>solr</a>, <a href='http://petrushev.wordpress.com/tag/sphinxsearch/'>sphinxsearch</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/83/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=83&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/09/24/rdbms-over-search-engines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
		<item>
		<title>I got an answer from David Gross</title>
		<link>http://petrushev.wordpress.com/2010/08/10/answer-david-gross/</link>
		<comments>http://petrushev.wordpress.com/2010/08/10/answer-david-gross/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 22:19:48 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[physics]]></category>
		<category><![CDATA[alternative theory]]></category>
		<category><![CDATA[David Gross]]></category>
		<category><![CDATA[experimental validation]]></category>
		<category><![CDATA[Higgs boson]]></category>
		<category><![CDATA[LHC]]></category>
		<category><![CDATA[Standard Model]]></category>
		<category><![CDATA[supersymmetry]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=74</guid>
		<description><![CDATA[Today I was very surprised to discover that I got a video answer on YouTube by the renowned physicist and Nobel Laureate David Gross &#8211; as a part of nobelprize.org event &#8220;Ask a Nobel Laureate&#8221;. I found about this event on its facebook page, but at first didn&#8217;t gave it too much attention since there we&#8217;re a lot [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=74&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Today I was very surprised to discover that I got a video answer on YouTube by the renowned physicist and Nobel Laureate <a title="David Gross" href="http://en.wikipedia.org/wiki/David_Gross" target="_blank">David Gross</a> &#8211; as a part of <a title="nobelprize.org" href="http://nobelprize.org" target="_blank">nobelprize.org</a> event &#8220;Ask a Nobel Laureate&#8221;. I found about this event on its facebook page, but at first didn&#8217;t gave it too much attention since there we&#8217;re a lot of non-sense questions all along with pseudo-science and space fiction. But later I decided to give it a try: I am a physicist, I know a great deal about the Standard Model, I understand what the guys at the Large Hadron Collider (LHC) are doing and what I actually asked was a solid interest of mine. <a title="Questions for David Gross" href="http://www.facebook.com/notes/nobelprizeorg/questions-for-david-gross/401404001719" target="_blank">Here</a> is my question [look for Baze Petrushev]:</p>
<p><strong>Suppose we fail at discovering the Higgs boson and SUSY </strong>(supersymmetry)<strong> at LHC &#8211; do physicists have an alternative theory on mass generation and supersymmetry? And if not, will the data from LHC be enough for physicists to invent a replacement of (part of) the Standard Model?</strong></p>
<p>This was part of my concern on sometimes overwhelming self-confidence that physicists have in the established science models. Lord Kelvin for example, was strongly convinced that in his time, only a minuscule part of the knowledge of the natural world was unknown to mankind. I guess that since his time, the world-view of the science community had changed, as the myriad of beautiful strange theories emerged.</p>
<p>After all, the Standard Model had survived the test of time many times, as it became the most precise theory, along with the General Theory of Relativity. And even if its part with the Higgs boson and/or supersymmetry fails,  the good guys will find a way to fix it. At least, this is what David Gross ensures me in his answer:</p>
<p><b>What happens if the LHC fails to find the Higgs boson?</b></p>
<span style="text-align:center; display: block;"><a href="http://petrushev.wordpress.com/2010/08/10/answer-david-gross/"><img src="http://img.youtube.com/vi/eGU2G0dHlbw/2.jpg" alt="" /></a></span>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/alternative-theory/'>alternative theory</a>, <a href='http://petrushev.wordpress.com/tag/david-gross/'>David Gross</a>, <a href='http://petrushev.wordpress.com/tag/experimental-validation/'>experimental validation</a>, <a href='http://petrushev.wordpress.com/tag/higgs-boson/'>Higgs boson</a>, <a href='http://petrushev.wordpress.com/tag/lhc/'>LHC</a>, <a href='http://petrushev.wordpress.com/tag/physics/'>physics</a>, <a href='http://petrushev.wordpress.com/tag/standard-model/'>Standard Model</a>, <a href='http://petrushev.wordpress.com/tag/supersymmetry/'>supersymmetry</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/74/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=74&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/08/10/answer-david-gross/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
		<item>
		<title>Pattern extraction with pyparsing</title>
		<link>http://petrushev.wordpress.com/2010/07/18/pattern-extraction-pyparsing/</link>
		<comments>http://petrushev.wordpress.com/2010/07/18/pattern-extraction-pyparsing/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 16:03:01 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[match]]></category>
		<category><![CDATA[pattern]]></category>
		<category><![CDATA[pyparsing]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[textprocessing]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=65</guid>
		<description><![CDATA[My previous post that had to do with pyparsing considered exact parsing grammars, that is, parsers that are applied to completely structured text, such as configuration files or logs. But sometimes there is a need to look for patterns in files, and you can also use pyparsing for this end. The key difference is using [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=65&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>My previous <a href="http://petrushev.wordpress.com/2010/04/19/pyparsing-windows-ini-config-parser/">post</a> that had to do with <a href="http://pyparsing.wikispaces.com/">pyparsing</a> considered exact parsing grammars, that is, parsers that are applied to completely structured text, such as configuration files or logs. But sometimes there is a need to look for patterns in files, and you can also use pyparsing for this end. The key difference is using the pyparsing grammar objects&#8217; scanString() method instead of parseString() method.</p>
<p>I wrote two convenience functions that use these methods:</p>
<p><code><br />
def match_one(grammar, text):<br />
<span style="visibility:hidden;">....</span>    try:<br />
<span style="visibility:hidden;">........</span>        match, start, end = grammar.scanString(text).next()<br />
<span style="visibility:hidden;">........</span>        return text[:start].strip(), match, text[end:].strip()<br />
<span style="visibility:hidden;">....</span>    except StopIteration:<br />
<span style="visibility:hidden;">........</span>       # no match found<br />
<span style="visibility:hidden;">........</span>        return text, None, None<br />
</code></p>
<p>This one searches for grammar (pattern) in the text, then returns the texts before the match, the match itself, and the text after the match. It only looks for one occurrence of the pattern in the text. The scanString() method itself returns a generator object that yields tuples of the match (as pyparsing result objects), the start position and the end position of the match in the text. </p>
<p>Here is an example on looking numbers in text:</p>
<p><code><br />
grammar = Word(nums)<br />
text = 'some 3 text 45 in'<br />
print match_one(grammar, text)<br />
</code></p>
<p>It returns &#8216;some&#8217;, (['3'], {}), &#8216;text 45 in&#8217;.</p>
<p>The other example is this one:</p>
<p><code><br />
def search(grammar, text):<br />
<span style="visibility:hidden;">....</span>    start_=0<br />
<span style="visibility:hidden;">....</span>    result = []<br />
<span style="visibility:hidden;">....</span>    for match, start, end in grammar.scanString(text):<br />
<span style="visibility:hidden;">........</span>        result.append((text[start_:start].strip(), match))<br />
<span style="visibility:hidden;">........</span>        start_=end<br />
<span style="visibility:hidden;">....</span>    result.append((text[start_:].strip(),None))<br />
<span style="visibility:hidden;">....</span>    return result<br />
</code></p>
<p>This searches for all occurrences in the text, giving the text between the matches and the match itself, as a tuples. The last item in the list is the remainder and None. Here is the sample from above:</p>
<p><code><br />
print search(grammar, text)<br />
[('some', (['3'], {})), ('text', (['45'], {})), ('in', None)]<br />
</code></p>
<p>These are especially helpful if you need a lexical extraction or text highlighting. They can process whole batches of text in a few moments.</p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/match/'>match</a>, <a href='http://petrushev.wordpress.com/tag/pattern/'>pattern</a>, <a href='http://petrushev.wordpress.com/tag/pyparsing/'>pyparsing</a>, <a href='http://petrushev.wordpress.com/tag/python/'>python</a>, <a href='http://petrushev.wordpress.com/tag/textprocessing/'>textprocessing</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/65/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=65&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/07/18/pattern-extraction-pyparsing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
		<item>
		<title>On sqlalchemy – what do you put in your base model?</title>
		<link>http://petrushev.wordpress.com/2010/06/22/sqlalchemy-base-model/</link>
		<comments>http://petrushev.wordpress.com/2010/06/22/sqlalchemy-base-model/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 12:27:32 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[basemodel]]></category>
		<category><![CDATA[classmethod]]></category>
		<category><![CDATA[eagerloading]]></category>
		<category><![CDATA[json]]></category>
		<category><![CDATA[orm]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[serialize]]></category>
		<category><![CDATA[sqlalchemy]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=48</guid>
		<description><![CDATA[My previous post regarded the reflective use of sqlalchemy, and there I showed how to set up a codebase for BaseModel and schema mapper. I put only one method in my BaseModel, and that was actually the property session, which returns the session that holds the instance (None if the instance is not added). Now, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=48&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">My <a href="/2010/06/16/reflective-approach-on-sqlalchemy-usage/">previous post</a> regarded the reflective use of <a href="http://www.sqlalchemy.org/">sqlalchemy</a>, and there I showed how to set up a codebase for BaseModel and schema mapper. I put only one method in my BaseModel, and that was actually the property session, which returns the session that holds the instance (None if the instance is not added).</div>
<p></p>
<div>Now, before I continue, just a short description of this test database. The director table holds the movie directors, lastname is required and unique. The movie table holds the movies, movie field is required and unique, director_id is the link to the director (can be null, for unknown/not-added director), and year &#8211; also optional. The genre table holds just the genre names, and the movie_genre is only association table. Before you continue, you should check the mappers we added in the previous post.</div>
<p></p>
<div>There is an easy way to view all the information packed in a list with a coupled query, but we&#8217;ll do it the sqlalchemy&#8217;s way, that is, we&#8217;ll query it with its orm.</div>
<p><code>
<div>from sqlalchemy.orm import eagerload_all</div>
<div id="_mcePaste">movies=session.query(Movie)\</div>
<div id="_mcePaste"><span style="visibility:hidden;">..............</span>.options(eagerload_all(Movie.director, Movie.genres))\</div>
<div id="_mcePaste"><span style="visibility:hidden;">..............</span>.limit(10).all()</div>
<p></code></p>
<div>Notice that we added an option called eagerload_all that specifies which mapped properties of the queried objects should be loaded on the fly into the sessions &#8211; the so-called eagerloading. Use this if and only if you are certain that you will need this properties. We added limit of 10 only for clarity. If I print these objects like this:</div>
<p><code>
<div id="_mcePaste">for m in movies:</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>print "\"%s\" (%s) by %s %s (%s)" \</div>
<div id="_mcePaste"><span style="visibility:hidden;">..........</span>% (m.movie, m.year, m.director.firstname, m.director.lastname,</div>
<div id="_mcePaste"><span style="visibility:hidden;">..........</span>", ".join([g.genre for g in m.genres]))</div>
<p></code></p>
<div id="_mcePaste">I get my sample database:</div>
<p><code>
<div id="_mcePaste">"A Clockwork Orange" (1971) by Stanley Kubrick (Sci-fi)</div>
<div id="_mcePaste">"Dr. Strangelove" (None) by Stanley Kubrick (Comedy)</div>
<div id="_mcePaste">"2001: A Space Oddisey" (1969) by Stanley Kubrick (Post-modern, Sci-fi)</div>
<div id="_mcePaste">"Lolita" (None) by Stanley Kubrick ()</div>
<div id="_mcePaste">"Shadow of a Doubt" (None) by Alfred Hitchcock (Noir, Mystery)</div>
<div id="_mcePaste">"Vertigo" (1957) by Alfred Hitchcock (Mystery)</div>
<div id="_mcePaste">"Citizen Kane" (1941) by Orson Welles ()</div>
<div id="_mcePaste">"Europa" (1991) by Lars von Trier ()</div>
<div id="_mcePaste">"Annie Hall" (None) by Woody Allen (Comedy)</div>
<div id="_mcePaste">"Element of Crime" (1984) by Lars von Trier ()</div>
<p></code></p>
<div id="_mcePaste">Now that we have a test data, I&#8217;ll continue with my BaseModel.</div>
<p></p>
<div id="_mcePaste">There is always the question of what kind of foreign key we should use. We can agree that incremental unsigned integers for every table is a good way of efficiently linking the tables, removing the complexity of using composite foreign keys or keys with slow look-up, but sometimes people tend to put these id-s into the application. Surely, the director with id 4 has bears no information, but when you say director &#8216;Hitchcock&#8217; you get a meaningful data in your application. So, my approach is &#8211; keep the id-s in your database and out of the application. Use &#8216;natural&#8217; key to refer to records from the application.</div>
<p></p>
<div id="_mcePaste">If I proceed in this manner, it would be a good thing to have a method load() associated to each model that can return a unique record/object (if exists) given a set of arguments. Retrieving unique records with sqlalchemy is very easy, sqlalchemy&#8217;s quering mechanism has a method one() that in the background makes a LIMIT 2: if there are two records found &#8211; throws MultipleResultsFound; if there are no results &#8211; throws NoResultsFound; and if it finds one &#8211; it returns it. Here&#8217;s how:</div>
<p><code>
<div id="_mcePaste">d = session.query(Director)\</div>
<div id="_mcePaste">
<span style="visibility:hidden;">...........</span>.filter(Director.lastname=='Hitchcock').one()</div>
<div id="_mcePaste">print d.firstname</div>
<p></code></p>
<div id="_mcePaste">One genuine feature of the python language are the class methods, which implicitely take the class as the first parameter. So, instead of doing this:</div>
<p><code>
<div>class Director(BaseModel):</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>@staticmethod</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>def load(session, lastname):</div>
<div id="_mcePaste"><span style="visibility:hidden;">........</span>return session.query(Director)\</div>
<div id="_mcePaste">
<span style="visibility:hidden;">......................</span>.filter(Director.lastname==lastname)\</div>
<div id="_mcePaste">
<span style="visibility:hidden;">......................</span>.one()</div>
<p></code></p>
<div id="_mcePaste">which we ought to do for every model, we can put something more generic in the BaseModel:</div>
<p><code>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>@classmethod</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>def load(cls, session, **kwargs):</div>
<div id="_mcePaste"><span style="visibility:hidden;">........</span>q = session.query(cls)</div>
<div id="_mcePaste">
<span style="visibility:hidden;">........</span>filters = [getattr(cls, field_name)==kwargs[field_name] \</div>
<div id="_mcePaste">
<span style="visibility:hidden;">...................</span>for field_name in kwargs]</div>
<div id="_mcePaste">
<span style="visibility:hidden;">........</span>return q.filter(and_(*filters)).one()</div>
<p></code></p>
<div id="_mcePaste">With this &#8211; we get static load method for all of the models! Sorry, but I don&#8217;t think you can do a similar thing in some other language &#8211; please note me if one can. The tricky part in this method is the list comprehension, but it is quite obvious, we just &#8216;compile&#8217; a list of filters for every named argument. For example, for the Movie model, the call:</div>
<p><code>
<div id="_mcePaste">Movie.load(session, movie='Europa', year=1991)</div>
<p></code></p>
<div>translates the list comprehension into:</div>
<p><code>
<div>[Movie.movie=='Europa', Movie.year==1991]</div>
<p></code></p>
<div>Later, we pass the list as an argument-list to and_() (from sqlalchemy.sql.expression) which compiles filter clause of these two filters. The cls argument is actually the Movie class/model, which is implicitly passed when the method is called.</div>
<p></p>
<div>In fact, a very basic method is a class method. Actually, this is also one of the python&#8217;s unique features: the __new__ method. This one returns the object if it is not overridden, and if it is, it may not return the object, it may return some specific object, or the default/desired object. If it returns some object, the __init__ method will be called upon the newly created object. This method is actually quite good at constructing the singleton pattern. No one will try to put a BaseModel instance into a session, but anyway, we&#8217;ll forbid it:</div>
<p><code>
<div>def __new__(cls):</div>
<div id="_mcePaste">
<span style="visibility:hidden;">....</span>if cls==BaseModel:</div>
<div id="_mcePaste">
<span style="visibility:hidden;">........</span>raise TypeError, "You can not instantiate a BaseModel"</div>
<div id="_mcePaste">
<span style="visibility:hidden;">....</span>return super(BaseModel, cls).__new__(cls)</div>
<p></code></p>
<div>So, we raise a TypeError if we try to create a BaseModel instance, and if it is an instance from a child class, we simply return such instance.</div>
<p></p>
<div>For deleting a single object (using the session), sqlalchemy provides:</div>
<p><code>
<div>session.delete(obj)</div>
<p></code></p>
<div>We would want to provide class access for deletion, simply:</div>
<p><code>
<div>def drop(self):</div>
<div id="_mcePaste">
<span style="visibility:hidden;">....</span>self.session.delete(self)</div>
<p></code></p>
<div>where one can see the usefulness of having a session property from the instance. To use:</div>
<p><code>
<div>d = Director.load(session, lastname='Hitchcock')</div>
<div id="_mcePaste">d.drop()</div>
<p></code></p>
<div>My next item in the BaseModel is a serializer. I&#8217;ll make a json serialize method. I simply will serialize the dictionary of the object without the</div>
<div>_sa_instance_state attribute which can not be encoded. Simply:</div>
<p><code>
<div>def serialize(self):</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>tmp = self.__dict__.copy()</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>del tmp['_sa_instance_state']</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>return simplejson.dumps(tmp)</div>
<p></code></p>
<div>Finally, I needed a specific method for one model in a project, and it found its way to the BaseModel. Namely, I want to what is the position of a record in a table when I sort by a particular column, and it is actually the count of records which satisfy the condition that their values are smaller than the value of the particular record with regard of the sorting column. I can put this in my BaseModel:</div>
<p><code>
<div>def position(self, order):</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>cls = self.__class__</div>
<div id="_mcePaste"><span style="visibility:hidden;">....</span>return self.session.query(cls)\</div>
<div id="_mcePaste">
<span style="visibility:hidden;">...............</span>.filter(getattr(cls, order)&lt;getattr(self, order))\</div>
<div id="_mcePaste">
<span style="visibility:hidden;">...............</span>.count()</div>
<p></code></p>
<div>The __class__ property of the instance simply returns the particular model.</div>
<p></p>
<div>The next post on sqlalchemy will probably have some methods and properties in the specific models &#8211; all done with sqlalchemy&#8217;s orm querying &#8211; just some stuff I had to leard with trial and error. But I think my next post will deal with <a href="http://pyparsing.wikispaces.com/">pyparsing</a> again, and I will parse in a different way than <a href="/2010/04/19/pyparsing-windows-ini-config-parser/">that time</a>.
</div>
<p></p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/basemodel/'>basemodel</a>, <a href='http://petrushev.wordpress.com/tag/classmethod/'>classmethod</a>, <a href='http://petrushev.wordpress.com/tag/eagerloading/'>eagerloading</a>, <a href='http://petrushev.wordpress.com/tag/json/'>json</a>, <a href='http://petrushev.wordpress.com/tag/orm/'>orm</a>, <a href='http://petrushev.wordpress.com/tag/python/'>python</a>, <a href='http://petrushev.wordpress.com/tag/serialize/'>serialize</a>, <a href='http://petrushev.wordpress.com/tag/sqlalchemy/'>sqlalchemy</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/48/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/48/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/48/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/48/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/48/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/48/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/48/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/48/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=48&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/06/22/sqlalchemy-base-model/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
		<item>
		<title>A reflective approach on sqlalchemy usage</title>
		<link>http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/</link>
		<comments>http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 17:27:25 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[orm]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[reflective]]></category>
		<category><![CDATA[sqlalchemy]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=30</guid>
		<description><![CDATA[Sometimes we are forced to work with legacy databases and we don&#8217;t have the beautiful green field to create our schema declaratevely. Sqlalchemy provides a very thorough way of reflecting the schema over our bussines logic and mapping the tables and relations with our models. Say we have a simple database for videostore with according [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=30&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div>Sometimes we are forced to work with legacy databases and we don&#8217;t have the beautiful green field to create our schema <a href="http://blog.brandonbloom.name/2009/10/orms-and-declarative-schemas.html">declaratevely</a>. <a href="http://www.sqlalchemy.org/">Sqlalchemy</a> provides a very thorough way of reflecting the schema over our bussines logic and mapping the tables and relations with our models.</div>
<p></p>
<div>Say we have a simple database for videostore with according to the following entity relationship diagram.</div>
<div><img src="http://petrushev.files.wordpress.com/2010/05/videostore-rel.png?w=600" alt="Videostore ER diagram" /></div>
<div>(created with <a href="http://www.dbvis.com/">dbvis</a>)</div>
<p></p>
<div>I&#8217;ll split the code in 2 modules: base and models. The base module will contain the reusable code that we can use with any db schema later on, while the models module will contain the specifics for this schema. We&#8217;ll have a simple mysql database with unicode charset.</div>
<p></p>
<div>The base module contains one function and one class. Let&#8217;s import some stuff:</div>
<p><code></p>
<div>from sqlalchemy import MetaData, create_engine</div>
<div>from sqlalchemy.orm import mapper, sessionmaker</div>
<div>from sqlalchemy.orm.session import object_session</div>
<p></code></p>
<div>We will put our BaseModel class here, and extend the other models from it.</div>
<p><code></p>
<div>class BaseModel(object):</div>
<div><span style="visibility:hidden;">....</span>@property</div>
<div><span style="visibility:hidden;">....</span>def session(self):</div>
<div><span style="visibility:hidden;">........</span>return object_session(self)</div>
<p></code></p>
<div>I won&#8217;t argue about the usefullness of having a base model. I&#8217;ll show what we can put here in the post yet to come. Sqlalchemy gives us the ability to fetch the session that holds an instance of some model by using object_session. Of course, if the object is not added in a session, this will return None. The next is the reflect function:</div>
<p><code></p>
<div>def reflect(connection_string, models):</div>
<div><span style="visibility:hidden;">....</span>metadata = MetaData()</div>
<div><span style="visibility:hidden;">....</span>metadata.bind = create_engine(connection_string)</div>
<div><span style="visibility:hidden;">....</span>metadata.reflect()</div>
<div><span style="visibility:hidden;">....</span>mappers = {}</div>
<div><span style="visibility:hidden;">....</span>for table_name in metadata.tables:</div>
<div><span style="visibility:hidden;">........</span>model_name = "".join(part.capitalize()\</div>
<div><span style="visibility:hidden;">.............................</span>for part in table_name.split("_"))</div>
<div><span style="visibility:hidden;">........</span>try:</div>
<div><span style="visibility:hidden;">............</span>model = getattr(models, model_name)</div>
<div><span style="visibility:hidden;">........</span>except AttributeError:</div>
<div><span style="visibility:hidden;">............</span>raise NameError, "Model %s not found in module %s" \</div>
<div><span style="visibility:hidden;">..............................</span>%(model_name, repr(models))</div>
<div><span style="visibility:hidden;">........</span>mappers[table_name] = mapper(model, metadata.tables[table_name])</div>
<div><span style="visibility:hidden;">....</span>Session = sessionmaker(metadata.bind, autocommit=False)</div>
<div><span style="visibility:hidden;">....</span>return (mappers, metadata.tables, Session)</div>
<p></code></p>
<div>In the reflect function, after metadata.reflect() is called, the metadata contains the reflected tables from our schema. We proceed to create mappers over our models. To do this in a more automated manner, let us make a convention that our model names will be simply the names of the tables titlecased, e.g., director table will map to the model Director and movie_genre to MovieGenre. We need to have these models declared in a module that gets passed as a second parameter to the reflect function.</div>
<p></p>
<div>After the iteration over metadata.tables, our models will be mapped to our tables. One thing that also make use of the metadata is the Session class which we generate with the sqlalchemy&#8217;s sessionmaker. The autocommit is a matter of personal preference, I personally want to have more control over wheather instances in the session are commited or expunged in my applications. We complete our reflect module by returning the mappers, tables and the session class.</div>
<p></p>
<div>Now we move to models, the code specific to our schema. We will need these:</div>
<p><code></p>
<div>from sys import modules</div>
<div>from sqlalchemy.orm import relationship</div>
<div>from base import BaseModel, reflect</div>
<p></code></p>
<div>One should notice the 0.5-&gt;0.6 sqlalchemy transition of relation to relationship here. We won&#8217;t put any code in the models (maybe in the next post):</div>
<p><code></p>
<div>class Director(BaseModel):</div>
<div><span style="visibility:hidden;">....</span>pass</div>
<div>class Movie(BaseModel):</div>
<div><span style="visibility:hidden;">....</span>pass</div>
<div>class Genre(BaseModel):</div>
<div><span style="visibility:hidden;">....</span>pass</div>
<div>class MovieGenre(BaseModel):</div>
<div><span style="visibility:hidden;">....</span>pass</div>
<p></code></p>
<div>And, we define the mappers in a function that is some kind of functional extension to the reflect function:</div>
<p><code></p>
<div>def map(connection_string):</div>
<div><span style="visibility:hidden;">....</span>models=modules['models']</div>
<div><span style="visibility:hidden;">....</span>mappers, tables, Session = reflect(connection_string, models)</div>
<div><span style="visibility:hidden;">....</span>mappers["director"].add_properties({</div>
<div><span style="visibility:hidden;">........</span>"movies": relationship(models.Movie,</div>
<div><span style="visibility:hidden;">...............................</span>backref="director",</div>
<div><span style="visibility:hidden;">...............................</span>cascade="all, delete-orphan")</div>
<div><span style="visibility:hidden;">....</span>})</div>
<div><span style="visibility:hidden;">....</span>mappers["movie"].add_properties({</div>
<div><span style="visibility:hidden;">........</span>"genres": relationship(models.Genre,</div>
<div><span style="visibility:hidden;">...............................</span>backref="movies",</div>
<div><span style="visibility:hidden;">...............................</span>secondary=tables['movie_genre'])</div>
<div><span style="visibility:hidden;">....</span>})</div>
<div><span style="visibility:hidden;">....</span>return (mappers, tables, Session)</div>
<p></code></p>
<div>Notice that we got the &#8216;models&#8217; module (needed for the second argument in the reflect function) from the modules[] &#8211; it is this module itself. One can argue about how to implement/extend the relationships, but the point is that we want an easy access for getting a session, here being simply:</div>
<p><code></p>
<div>session = map(conn_str)[2]()</div>
<p></code></p>
<div>Finnaly, I want to stress that this is not meant to be code that you simply put in production environment, but to outline a way to get the most from already build database schema using sqlalchemy. After all, we live and work in a world where you can not rely on your current database setup and have to make frequent changes to the schema. Declarative way of doing things in unaffordable.</div>
<p></p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/orm/'>orm</a>, <a href='http://petrushev.wordpress.com/tag/python/'>python</a>, <a href='http://petrushev.wordpress.com/tag/reflective/'>reflective</a>, <a href='http://petrushev.wordpress.com/tag/sqlalchemy/'>sqlalchemy</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/30/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=30&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/06/16/reflective-approach-on-sqlalchemy-usage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>

		<media:content url="http://petrushev.files.wordpress.com/2010/05/videostore-rel.png" medium="image">
			<media:title type="html">Videostore ER diagram</media:title>
		</media:content>
	</item>
		<item>
		<title>What is so great about pyparsing?</title>
		<link>http://petrushev.wordpress.com/2010/04/19/pyparsing-windows-ini-config-parser/</link>
		<comments>http://petrushev.wordpress.com/2010/04/19/pyparsing-windows-ini-config-parser/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 12:12:02 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[pyparsing]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=10</guid>
		<description><![CDATA[or: simple .ini parser with pyparsing The answer to the question above is: readable regular expressions. Code readability is probably the most common reason (or one of the most common reasons) why someone decides to code something in python. When you come to an area of regular expressions, no matter how good you are at writing them, the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=10&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>or:</p>
<h3 style="font-size:120%;">simple .ini parser with pyparsing</h3>
<p>The answer to the question above is: readable regular expressions. Code readability is probably the most common reason (or one of the most common reasons) why someone decides to code something in python. When you come to an area of regular expressions, no matter how good you are at writing them, the problem of revisiting, refactoring and modifying is always quite big. It is simple &#8211; the regular expressions, no matter how much powerful, are quite unreadable. Pyparsing deals with this problem.</p>
<p><a href="http://pyparsing.wikispaces.com/" target="_blank">Pyparsing</a> is a library completely written in python that provides a set of classes and utilities for building a readable grammar expression for parsing of extremely complex structured texts. You don&#8217;t need to know regular expressions, in fact, during the time I used it &#8211; I never stumbled upon a regular expression created by this library or one that the library needed me to code, however, you can start with parsing your text right away. If you&#8217;re not familiar with the topic of parsing, you can read a very good introductory article by <a href="http://www.oreillynet.com/pub/au/2557" target="_blank">Paul McGuire</a> <a href="http://www.oreillynet.com/pub/a/python/2006/01/26/pyparsing.html" target="_blank">here</a>, where he also explains the formal declaration of parsing grammars known as Backus-Naur form (BNF).</p>
<p>The same author also wrote this wonderful <a href="http://oreilly.com/catalog/9780596514235/" target="_blank">beginner&#8217;s book</a> which includes about 90% of all you&#8217;ll need to know on parsing with pyparsing.</p>
<p>There is no better way of presenting this one than with an example. Allow me the pleasure to have a non-sense windows .ini configuration file in the manner of (monty) python:</p>
<p><code> [db]<br />
user=eric<br />
pass=idle</code></p>
<p><code> </code></p>
<p><code>[timeout]<br />
ip=127.0.1.1<br />
time = 4</code></p>
<p><code> </code></p>
<p><code>[users]<br />
names = idle, gilliam</code></p>
<p>Now,  the problem is how to fetch this into a useful python dictionary? We notice that the configuration file is separated in 3 namespaces (db, timeout, users), and each of them contains one or more definition lines that contain the literal &#8220;=&#8221;. How does pyparsing work? It works by creating different grammars for all the elements in the texts and later combining and grouping them in one unified grammar. Maybe also defining specific parse actions or setting names. Let&#8217;s go on with a &#8220;hello world&#8221; example:<br />
<code><br />
from pyparsing import Word, alphas</code></p>
<p><code> </code></p>
<p><code>grammar = Word(alphas)<br />
tokens = grammar.parseString("hello!")<br />
print tokens<br />
</code><br />
- result -<br />
<code>['hello']</code></p>
<p>You can see that the exclamation point did not enter the resulted token since the grammar expression is just a word with alphabet characters.</p>
<p>Let&#8217;s dive into our problem. Each of the three namespaces has a header in brackets. We will define it as:<br />
<code><br />
word = Word(alphas)<br />
header = Suppress("[")+word.setResultsName("header")+Suppress("]")+LineEnd()<br />
</code></p>
<p>Of course, all the new names you&#8217;ll have to import from pyparsing (Suppress, LineEnd and later some others). We first defined a word grammar because we will use it again later. Suppress will tell the grammar not to include this expression in the results, thus, preventing the clutter of brackets in the end. One nice thing about this is the .setResultsName() method that enables referencing specific name from the resulting tokens.</p>
<p>We see that all the definition lines are sepatated with &#8220;=&#8221; and on the left side is the definer which is a simple word. The values on the right side, however, are varying, and are one of the following: word, list of worlds, number, ip. Thus, we have the following grammars:</p>
<p><code>number = Word(nums)<br />
list_of_words = Group(ZeroOrMore(word + Suppress(",")) + word)<br />
ip_field = Word(nums, max=3)<br />
ip = Combine(ip_field+"."+ip_field+"."+ip_field+"."+ip_field)</code></p>
<p><code> </code></p>
<p><code>definer = word.setResultsName("definer")<br />
value = Or([number, word, list_of_words, ip]).setResultsName("value") </code></p>
<p>Here, we can see how to build list of stuff separated with something (ZeroOrMore) and combining tokens into one (Combine). The Word grammar object has parameters for limitations of its definition like max in this example, but also exact, bodyChars and min. Also, as our right side in the definition line varies, we use the Or expression builder. Of course, there is also And().</p>
<p>Now, we are moving to the finalization of our parser. We have all our elementary building blocks needed (header, definer, value), so, we can build the more complex ones. This is how:</p>
<p><code>definition_line = Group(definer+Suppress("=")+value+LineEnd())<br />
namespace = Group(header+\<br />
<span style="visibility:hidden;">................</span>OneOrMore(definition_line).setResultsName("definition_lines"))<br />
all = OneOrMore(namespace).setResultsName("namespaces")</code></p>
<p>Now, what have we done here? Let&#8217;s review from top to bottom. The complete grammar defined as all consists of one or more (OneOrMore() ) namespaces. Each namespace consists of header and one or more definition lines. And each definition line consists of a definer and a value. We added some Group() clauses as well as .setResultsName() on the parts we liked to name our result &#8211; and we are ready to parse! Get on with it!</p>
<p><code>result = all.parseString(content)</code></p>
<p>Huh? That&#8217;s it?!? Yes. Our result is neatly placed in a nice tree structure we can traverse with the attributes we set with .setResultsName(). You can check it with these:</p>
<p><code>for namespace in result.namespaces:<br />
<span style="visibility:hidden;">....</span>print namespace.header<br />
<span style="visibility:hidden;">....</span>for definition_line in namespace.definition_lines:<br />
<span style="visibility:hidden;">........</span>print definition_line.definer<br />
<span style="visibility:hidden;">........</span>print definition_line.value</code></p>
<p>Of course, I won&#8217;t be kind enough to present you with the complete parser we just built here. Fetch the content from a file and do proper (non-wild) import of all the building parts from pyparsing.</p>
<p>What is so great about pyparsing? Well, we don&#8217;t have to learn regular expressions. We got out results in a nice data structure. The library is in python and it is very easy to dive into it. But here is the greatest asset: it is readable &#8211; you can go back to the code at any time and modify it with ease! </p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/parser/'>parser</a>, <a href='http://petrushev.wordpress.com/tag/pyparsing/'>pyparsing</a>, <a href='http://petrushev.wordpress.com/tag/python/'>python</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/10/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/10/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/10/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=10&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/04/19/pyparsing-windows-ini-config-parser/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
		<item>
		<title>For starters&#8230;</title>
		<link>http://petrushev.wordpress.com/2010/04/19/for-starters/</link>
		<comments>http://petrushev.wordpress.com/2010/04/19/for-starters/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 10:01:19 +0000</pubDate>
		<dc:creator>Blagoj Petrushev</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[jinja]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[nltk]]></category>
		<category><![CDATA[pyparsing]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[sqlalchemy]]></category>
		<category><![CDATA[werkzeug]]></category>

		<guid isPermaLink="false">http://petrushev.wordpress.com/?p=5</guid>
		<description><![CDATA[Sometimes a simple google search simply can not give you the answers you are looking for. You might have to look up a book reference, browse through some blogs (the forums are not that useful no more), and, at times, even the official documentation is not enough. I recently read a very thoughtful post which underlines [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=5&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Sometimes a simple google search simply can not give you the answers you are looking for. You might have to look up a book reference, browse through some blogs (the forums are not that useful no more), and, at times, even the official documentation is not enough. I recently read a very thoughtful <a title="post" href="http://artificialcode.blogspot.com/2010/04/my-midlife-python-quality-crisis.html" target="_blank">post</a> which underlines (among the multitude of things) the importance of the negative feedback. And, a few years ago, I learned from a very good physics lecturer that the best way to learn about a topic is to write a book on it.</p>
<p>Well, I&#8217;m not going to write a book, but I can try to write a blog. Its intention is not to teach someone on the topics on programming, (mostly on python and linguistic processing), but to learn along the path of writing it and to learn from the negative feedback.</p>
<p>I recently started working on linguistic processing. I use pyparsing and nltk for it.</p>
<p>I like using werkzeug and jinja2 for a web interface. I use sqlalchemy to talk to a database. Goodbye sql.</p>
<p>A friend of mine who has quite an experience in this field recommended django. I was hesitating but decided to go the the loosely coupled option since I like jinja2 better than django templates and I like sqlalchemy more than the django orm. The convenience of the other django stuff is mostly unnecessary.</p>
<p>That&#8217;s it for now, stay tuned if any of the attached tags interests you.</p>
<br /> Tagged: <a href='http://petrushev.wordpress.com/tag/jinja/'>jinja</a>, <a href='http://petrushev.wordpress.com/tag/nlp/'>nlp</a>, <a href='http://petrushev.wordpress.com/tag/nltk/'>nltk</a>, <a href='http://petrushev.wordpress.com/tag/pyparsing/'>pyparsing</a>, <a href='http://petrushev.wordpress.com/tag/python/'>python</a>, <a href='http://petrushev.wordpress.com/tag/sqlalchemy/'>sqlalchemy</a>, <a href='http://petrushev.wordpress.com/tag/werkzeug/'>werkzeug</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/petrushev.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/petrushev.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/petrushev.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/petrushev.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/petrushev.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/petrushev.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/petrushev.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/petrushev.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=petrushev.wordpress.com&amp;blog=13140995&amp;post=5&amp;subd=petrushev&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://petrushev.wordpress.com/2010/04/19/for-starters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/874487f2d4d79853b5f6e800238b4824?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">petrushev</media:title>
		</media:content>
	</item>
	</channel>
</rss>
