Thursday, 23 October, 2008

ColdFusion Markup Language (CFM) Parser in Python

This is a simple ColdFusion Markup Language (CFM, CFML, CFC) parser written in Python. The Parser aims at finding out the places where tCFQueryParam Validations have been missing and corrects them.

In ColdFusion, whenever we have to place any variable into the SQL Statements, which are inside CFQuery Tags, we must use the CFQueryParam tags, to make sure they are protected from SQL Injections. However, there are many a cases, when these tags have been missed (knowingly or unknowingly) by the developer, and then at a later stage, applying the CFQueryParam tags to all of them is a very tedious job. So, i came up with this script that does this job for you. The project is located at I have also put my very basic script here, in order to help those who are looking for a similar kind of script. You can always go to the project home page and download the latest version, with bug fixes and many more features.

It scans a particular folder, creates a list of files that need CFQueryParam Validation tags, and then applies them. IT DOES NOT change the original file. It just tells where changes are needed and then displays the new SQL Statement that should be present. So, it leaves scope for manual work. Guys, relax, you won't be fired :-)

And for those people, who want everything to be done by this program or may be report issues, you can visit the project's homepage at

#!/usr/bin/env python

## A script to make sure all CFM Files have the CFQueryParam validation tags
## Copyright (C) 2008 Pranav Prakash
## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## GNU General Public License for more details.
## You should have received a copy of the GNU General Public License
## along with this program. If not, see

import sgmllib, re

class CFMLParser(sgmllib.SGMLParser):
def __init__(self, verbose=0):
sgmllib.SGMLParser.__init__(self, verbose)
self.insideCFQuery = False
self.insideComment = False
self.insideLogic = False
self.SQLQueries = []
self.QueryNames = []
self.NewSQLQueries = []
self.tempQuery = ''
self.unvalidatedPattern = re.compile("\s\w+\.?\w+\s=\s'?#\w+\.?\w+\(?\w*\)?#'?")
self.varcharPattern = re.compile("'#\w+\.?\w+\(?\w*\)?#'")
self.nonVarcharpattern = self.token = re.compile("#\w+\.?\w+\(?\w*\)?#")
self.lhs = re.compile("\s\w+\.?\w+\s=\s") = dict({'bit':'CF_SQL_BIT',

def start_cfquery(self, attributes):
self.insideCFQuery = True
for k,v in attributes:
if k == 'name':

def end_cfquery(self):
self.insideCFQuery = False
if self.tempQuery != '':
self.tempQuery = ''

def start_cfqueryparam(self, attributes):
self.tempQuery += self.get_starttag_text()

def end_cfqueryparam(self):

def start_cfif(self, attributes):
if self.insideCFQuery:
self.tempQuery += self.get_starttag_text()
self.insideLogic = True

def end_cfif(self):
if self.insideLogic:
self.tempQuery += '</cfif>'
self.insideLogic = False

def start_cfelse(self, attributes):
if self.insideLogic:
self.tempQuery += self.get_starttag_text()

def handle_data(self, data):
if self.insideLogic or self.insideCFQuery and len(data.lstrip()) > 0:
self.tempQuery += data

def handle_comment(self, comment):
if self.insideCFQuery:
self.tempQuery += '<!--'+comment+'-->'

def report_unbalanced(tag):
if tag == 'cfqueryparam':
if tag == 'cfelse':

def get_QueryNames(self):
return self.QueryNames

def get_OldSQLQueries(self):
return self.SQLQueries

def get_NewSQLQueries(self):
return self.NewSQLQueries

def ScanQuery(self, query):
self.NewSQLQueries.append(re.sub(self.unvalidatedPattern, self.handleIndividualTokens, query))

def findDataType(self, lvalue, rvalue):
for k in
p = k+'\w+'
pa = re.compile(p)
l = pa.findall(rvalue)
if l != []:
for k in
p = '\.?'+k+'\w+'
pa = re.compile(p)
l = pa.findall(lvalue)
if l != []:

def handleIndividualTokens(self, s):

tag =
m = self.varcharPattern.findall(tag)
if len(m) > 0:
rhsValue = self.token.findall(m[0])[0]
lhsValue = self.lhs.findall(tag)[0]
finalVal = lhsValue + '<cfqueryparam value = "'+rhsValue+'" cfsqltype="CF_SQL_VARCHAR" />'
return finalVal
lhsValue = self.lhs.findall(tag)[0]
rhsValue = self.nonVarcharpattern.findall(tag)[0]
finalVal = lhsValue + '<cfqueryparam value = "'+ rhsValue+'" cfsqltype="'+self.findDataType(lhsValue, rhsValue)+'" />'
return finalVal

def ScanQueries(self):
for SQL in self.SQLQueries:

def ScanAndReplace(text):
myCFMLParser = CFMLParser()
o = myCFMLParser.get_OldSQLQueries()
n = myCFMLParser.get_NewSQLQueries()
for i in (0, len(n)-1):
text = text.replace(o[i], n[i])
return text

if __name__ == '__main__':
inFile = '/home/pranav/projects/cfmparser/test.cfm'
f = open(inFile, 'r')
FileContentText =

print ScanAndReplace(FileContentText)

Don't forget top check the latest development at

1 comment: