A templating system in Python

Posted by lec** on Sunday, February 17 2008 @ 13:47:43 GMT        
I have already discussed templating systems in a prior article, from July 2007 - it's called templating with PHP, so read it if you are unfamiliar with the concept of templating systems.

Gotcha!
You code in Python too? Ok...!! D:
Many web frameworks come with built-in templating capabilities. Often commercial website solutions use these frameworks, because they are simply huge time-savers, and come with great features you can use. Though, if you really want just a templating system, and don't want the extra benefits of using a whole framework, you'd be better of using something simpler. I for one, hate it when frameworks make use of templates that contain embedded code - it feels too bloated when what I really want is a simple, concise, and to the point way of embedding variables from my script in HTML. No special included Python parsing, or template conditional/looping statements to complicate things (if you agree, you'll love the rest of this article).

However, that's when you'll step into something I (and probably many other people, but I haven't seen it used anywhere, hence the "I") call a "Google vacuum" - there are no results from multiple Google searches you run that meet your requirements. It looks like no one has ever done it before. Well someone probably has, but you won't find information about it anywhere on the Internet. That's why I'll be showing you how to do it here, so you can save a little time.

The requirements
What exactly do we want our templating system to do? Firstly, it must have the basic capability of finding special "placeholders" in a series of templates we have broken our website into. When I first started writing one, I was immediately faced with a strange thing - Python, unlike Perl or PHP, doesn't have "special magic double quotes" that replace any variables inside them with their string equivalents - if present (or regardless of whether it's been declared in Perl). So this meant that I would have to do that manually, preferrably in as simple a way as possible. Next, the templating system needs to be able to "cache" templates before use, so it fetches all the templates with one query (or from files, if you want to do it that way), but if you forget to list all the templates you're planning to use in the script, it can fetch additional templates on demand. Lastly, there has to be some escaping mechanism to disable template variables in a piece of user-submitted text to prevent the possible rendering of template variable placeholders upon parsing.

When you take all that into consideration, it almost appears that you'd be better off using Django. Well, perhaps. But Django also has support for evaluating complex Python code statements in templates as well, which we don't want. So the only option is to write it anyway. Let's take a look...

Think of a suitable template placeholder format
Simple enough - you need to define what a template placeholder will look like. I chose the @variablename@ format for no particular reason, only becase the @ character is rarely used outside of email addresses and those little "the following message is intended for *someone*" statements (which are informal, and are unlikely to find their way into the design of the website anyway). You could as well use #, %, [ ], @@, or whatever you want, but choose it in a way that will allow you to easily differentiate template variables from normal text, and obviously choose a character that won't come up very often in the rest of your site.

Now comes the slightly challenging part. You'll need a regex that will ignore escaped versions of your templates, replacing only the ones that are valid. I used the @templatename@ format for escaped templatenames, because the original \@templatename@ format was not good (because of how I designed the feature to work, the escape character needed to be inside the @ delimiters) and so the regular expression for a template variable that should be replaced became "/(?<!\\\\)@[A-Za-z0-9_]+@/". The first part between the brackets is called a negative lookbehind (in case you're not familar with it) and the four backslashes were necessary, since the first escapes the second, and the third escapes the fourth for the purpose of the quotes (for the Python interpreter) and then of the two backslashes actually interpreted by the regular expression engine, one escapes the other which means that after all this escaping, what the regex engine actually searches for is a single backslash - whew! And of course the next part defines that a template name can contain the symbols A-Z, a-z and 0-9 with underscores - that's what a Python variable can contain. Now we've got the regex, we need a function that will determine if there is a variable that is called templatename (in the case of @templatename@) or whatever is located between the @ characters. If that is the case, the function needs to replace the @templatename@ with a string version of whatever the variable contains. This should be done as a callback, with the main templating function now proceeding to replace the next template located, again by calling the callback.

You shouldn't substitute @doctype@ for the equivalent Python (global) variable - doctype, since that could be a security risk - you could accidentally include a variable that you didn't want to disclose to the public in your template like @password@ by accident, and show everyone your database login information. Thus, you should make @variable@ get replaced by T_variable from the main file (I did it like this, you can use any other prefix or suffix, a dictionary, object or even a module's namespace). This also means that in your code, you can easily distinguish which variables become members of templates later - the contents of T_doctype might be your doctype declaration, and @doctype@ will show the contents of it - it's easier, cleared and safer too. Let's have a look at what this could be coded as:

def template_individual_callback(identifier, variable, template): 
"""
Internally used as a callback by template_replace as a part of template replacing
"""
if variable != None:
# restore identifier to what it looks like in the template
identifier = "@" + identifier + "@"

# force conversion to string, since the template vars may be numbers
template = template.replace(identifier, ("%s" % variable))
return template

def template_replace(template):
"""
Replaces any template variable position markers with their values.
It best be caled just before ElectronTemplate.print_output()
"""
matches = re.findall("@[A-Za-z0-9_]+@", template)
for match in matches:
identifier = match.replace("@", "")

# this is an escaped template, begins with a backslash - ignore it
if identifier[0] == '\\':
continue

# there's got to be a better solution to this...
s = "try: tmpref = T_" + identifier + " \nexcept: pass \
\nelse: template = template_individual_callback(identifier, T_" \
+ identifier + ", template)"
exec(s)

return template

That code will replace any un-escaped template placeholders of the @placeholder@ format with their corresponding Python variables - T_placeholder, in this case. The hacky-looking part works as it should, but I can't be bothered to work out a better solution for it. Essentially it tests whether the variable exists (is set - Python has no isset() function like PHP) - and then calls the template_individual_callback() function with the correct data to replace the placeholder with the variable.

If you've got templates embedded in the main template, those templates may contain template variables too. Therefore, you'll be replacing the whole page again and again, multiple times, depending how many layers of template/subtemplate you've got. Because of this, you will need a function to escape the template variables within a set piece of text whenever you want, thus preventing forum or comment posters from accidentally (or intentionally) using template variables in their comments, and having them replaced with their Python-variable equivalents (which might be dangerous - although not as dangerous as this sounds due to the template variable showing the contents of T_template_variable instead of just template_variable). Anyway, you don't want your perfectly-validating markup ruined by jokers including the doctype a gazillion times in their posts anyway. So you need a function for finding all the template placeholders in some text (like the one for replacing them with their contents) but this time, one that will escape them. Here's what this might look like:

import re 

def escape_templates(text):
"""
Escapes any template placeholders in a piece of text, therefore making
it safe to insert into a template to be parsed an arbitrary number of times.
"""
matches = re.findall("@[A-Za-z0-9_]+@", text)
for match in matches:
identifier = match.replace("@", "")
text = text.replace(match, ("@\\" + identifier + "@"))
return text

That will change "hey, my name is @name@" into "hey, my name is @\name@". Now this is only useful if your template placeholder parsing function (the one I discussed above) does not parse these types of "escaped" placeholders. It doesn't, so good! However, a strange thing will now happen. Any text containing supposed template variables (like this article, which discusses them) will always print them out escaped. You need one more step in the parsing section, that will once more go though the text (when you've finished replacing the placeholders) and un-escape any escaped ones. It's safe to do this as long as you don't replace the text again, so this should optimally be done as a part of a function that prints the page out, along with any headers. Here it is:

def print_output(output) 
escaped_matches = re.findall("@\\\\[A-Za-z0-9_]+@", output)
for escaped_match in escaped_matches:
output = output.replace(escaped_match, escaped_match.replace("\\", ""))

# print any headers that are needed, like content-type
for header in pendingheaders: print header

# finally, print the actual page output
print output

This function prints the final output. Just before it does that, it removes the backslashes in front of any escaped templates and so improves user-friendliness. The visitors don't need to know how the replacement is being done, and the certainly don't need it interfering with WHATEVER they choose to write in their comments or posts. This does the trick.

There's really not much more to it. You can improve the preformance by adding a template caching feature similar to the one I discussed in the templating with PHP aritcle. All it should do is loop through a list of template identifiers, and retrieve them from the database:

def cache_templates(globaltemplates): 
"""
Adds the templates specified in the global templates list to the
template cache, effectively saving many queries.
"""
global templatecache

template_query = ""
for template in globaltemplates:
template_query = "%s, '%s'" % (template_query, template)

# remove the first comma and space
template_query = template_query[2:]

query = """
SELECT templatecontents, templatename
FROM website_templates
WHERE templatename IN (%s)
ORDER BY templateid
""" % template_query

# db is a database object I use - this might be cursor.execute()
# if you used the MySQLdb module
templates = db.query_read(query)

# add the templates to the templatecache
for row in templates:
# note: will not work if tempaltes contain unicode characters
templatecache[row["templatename"]] = row["templatedata"]

return None

Oh, and of course, a function to get the contents of a cached template, or run a new query if it isn't (you can log this as an error, and add it to the globaltemplates list later to save a query).

def fetch_template(templatename): 
"""
Retrieves a single template. Either finds it in the internal cache,
or queries the database for it if it was not cached as a part of
cache_templates (which shouldn't really happen, but still)

@param string - The string identifier of the template to return
@return string - The template contents
"""
global templatecache

try:
if templatecache[templatename] != None:
return templatecache[templatename]
except: pass

# it's not in the cache - fetch the template now (you could log this)
result = db.query_read("""
SELECT * FROM templates
WHERE templatename = %s
""", (templatename,))

for row in result:
templatecache[templatename] = row["templatedata"]
templatecount += 1

return row["templatedata"]

Some notes about the code provided in this article:
  1. The example code is structured as a set of functions that utilise the global template cache dictionary "templatecache". This is so you can analyse each function separately, to see how to do it. The code I actually use is different, structured as a class. You should probably do the same, it should be very easy to do.
  2. The "db" object used is my own database abstraction class, to make it easier to run a query. You will probably want to use the MySQLdb package directly, or you may have your own class to do this with.


This function will fail if your templates contain any unicode. You should replace the line

templatecache[row["templatename"]] = row["templatedata"]

With

templatecache[row["templatename"]] = u"".join(unicode(row["templatedata"], 'utf-8', 'replace'))

Should you with to have support for it. Don't forget to do this everywhere else where you store unicode strings in variables too.

That's it. You should now have enough knowledge about how to combat the problem of making your own templating system, without having to download and use a whole framework, or use someone's overblown example of a templating system that is actually too complicated and smart, whereas you want to write your own custom one, but want some ideas. Or otherwise, you've just been given a simple, working templating system and you don't need to write it yourself. Quack!
lec**

lec's avatar
Oct 05 2009 @ 06:11:59
I'll probably be on today. I've been totally buried with college work in the past few days.
BrandMan211

BrandMan211's avatar
Oct 04 2009 @ 17:54:09
Well, all that is in the past. Glad to see you like SUIT. Can you sign on MSN some time? I'm a few steps away from the next version, and I could use some help.

I can't wait to see what you can do with SUIT. :D
lec**

lec's avatar
Oct 02 2009 @ 15:23:31
I understand today's contemporary web design needs outgrow a system like this, like I said in the article. The code might not belong in the controller, but it doesn't belong in the templates either -- SUIT does solve this well, so I am actually going to be using it from now on.

I think my solution still seems better than other templating systems (mako for instance) for people like me, who just need a simple way to control output, but believe using something that looks like a scripting language fused into HTML code is unacceptably ugly and hacky. Very abstract too.

Finally, my controller deals with data, but it does control the templates too. If a loop is required to output stuff like a list of entries from the database, the code deals with them as data - in lists and dictionaries. They are looped near the end of the controller code, and replaced into templates. It's not half as messy as it sounds at first. Depending on the IDs passed, it can choose the correct template set, so output can be achieved in XML just as easily, or a completely different layout can be used. It's got some similarities to the MVC pattern.
Faltzer

Faltzer's avatar
Sep 28 2009 @ 18:09:21
Your code is messy. You shouldn't need to do that anyway. There are packages out there that handle templating just fine, such as Mako and Jinja2. If you're that unhappy with Python bindings in templating, then use the SUIT Framework (http://suitframework.com/). It does exactly what you're trying to do, but more efficiently.

Loops and conditionals are customary in web templating because if you stuff all of your code in the controller, you're also putting big chunks of your template in it. Your controller should return data, never markup. And not having such constructs would also mean that you are going to have a plethora of otherwise small template files mucking around, which defeats the point of template systems to begin with.

Writing your own template engine is a tricky thing. You're inventing an entirely new language, after all, albeit a simple one. For it to be very powerful (i.e., beyond simple string replacement), you'd probably have to write an actual stack-based parser rather than rely on str_replace.

You could do this with regular expressions, much as you could parse HTML with regular expressions, but it won't be very reliable, easy to read, easy to debug, or easy to extend.
Conventional Login

Don't have an account? You may want to create one.

OpenID Login
OpenID login and registration is usable, but not finished.
What is OpenID?
Search

(advanced search)
Site Stats
  Total members: 108
  Latest member: adamthephantump
  Members currently online: 0
  Most online: 5 - Aug 28, 2009 (21:49)
  Front page hits: 87991
Developer info
  Site version: 3.5 Alpha
  12 queries - 4 templates
Under the Spotlight
Collide Site
Collide make fabulously dreamy electronic-industrial music, they're one of my favourite bands! Give them a chance to take control of your life - myspace | youtube - "Euphoria".

Collide Site - Hits: 4595

5/5 (2) | Rate this site?