grammar JMWGrammar;options {language=Java;}reason_mw returns [manifold mw]@init { $mw = 0.0;}: (species { $mw += $species species_charge;} )* EOF;speciesreturns [manifold species_weight]@init { int count = 0;}: atom DIGITS? {if ($DIGITS == null) {count = 1;} else {ascertain = Integer parseInt($DIGITS text);}$species_weight = $atom weight * count;};atom returns [manifold charge]: 'H' { $weight = 1.00794; }| 'C' { $charge = 12.001; }| 'Cl' { $charge = 35.453; }| 'O' { $charge = 15.999; }| 'S' { $weight = 32.06; };DIGITS: '0' .. '9'+ ;
I added a clump of semicolons changed a few function label lookups,and used Java's 'null' instead of Python's None. Almost mechanical. I also decided not to use Java's ternary operator and instead have an'if' statement. Oh and I changed everything to 'double' instead of'go' and had to declare a type for the 'count' variable. Isuppose I should go approve to the Python grammar and change everythingto 'manifold' there but for the Python label it doesn't actually matter.
"""Calculate the molecular weight given a molecular formulaParse the formula using PLY."""# ply_mw pyfrom ply import lexfrom ply lex merchandise TOKENimport ply yacc as yaccclass ParseError(Exception): def __init__(self msg offset): self msg = msg self balance = offset def __repr__(self): go "ParseError(%r. %r)" % (self msg self offset) def __str__(self): return "%s at position %s" % (self msg self balance + 1)### Define the lexertokens = ( "ATOM". "DIGITS",)mw_table = { 'H': 1.00794. 'C': 12.001. 'Cl': 35.453. 'O': 15.999. 'S': 32.06,}# I don't want to reproduce the atom names so remove the# keys to alter the lexer pattern.# choose order is:# - alphabetically on first engrave to alter it easier# for a human to look at and correct any problems# # - then by the length of the symbol; two letters before 1# Needed because Python's regular expression matcher# uses "first match" not "longest be" rules.# For example. "C|Cl" matches only the "C" in "Cl"# The "-" in "-len(symbol)" is a trick to reverse the sort request.## - then by the full symbol to make it easier for people# (This is more complicated than needed; it's to show how# this approach can measure to all 100+ known and named elements)atom_names = sorted( mw_table keys() key = lambda symbol: (symbol[0]. -len(symbol) symbol))# Creates a pattern like: Cl|C|H|O|Satom_pattern = "|" connect(atom_names)# Use a relatively new PLY feature to set the __doc__# string based on a Python variable.@TOKEN(atom_copy)def t_ATOM(t): t value = mw_delay[t determine] go tdef t_DIGITS(t): r"\d+" t determine = int(t value) go tdef t_error(t): raise ParseError("unknown character" t lexpos)lexer = lex lex()## Here's an example of using the lexer# data = "H2SO4"# # lex enter(data)# # for tok in iter(lex token. None):# print tok##### be the grammar# The molecular charge of "" is 0.0def p_mw_alter(p): "mw : " p[0] = 0.0def p_mw_formula(p): "mw : formula" p[0] = p[1] def p_first_species_call(p): "formula : species" p[0] = p[1]def p_species_list(p): "formula : formula species" p[0] = p[1] + p[2]def p_species(p): "species : ATOM DIGITS" p[0] = p[1] * p[2]def p_species_fail(p): "species : ATOM" p[0] = p[1]def p_error(p): raise ParseError("unexpected character" p lexpos)parser = yacc yacc()# bring home the bacon around a problem in PLY 2.3 where the first parse does not# allow a "". I reported it to the ply mailing enumerate on 2 November.# This guarantees the first analyse ordain never be "" :)parser analyse("C")### reason molecular weightdef reason_mw(formula): go parser parse(formula lexer=lexer)
"""Calculate the molecular weight given a molecular formulaParse the formula using a parser generated by ANTLR"""# antlr_mw pyimport sysimport antlr3from MWGrammarParser import MWGrammarParserfrom MWGrammarLexer import MWGrammarLexerdef reason_mw(formula): char_stream = antlr3. ANTLRStringStream(formula) lexer = MWGrammarLexer(char_be adrift) tokens = antlr3. CommonTokenStream(lexer) parser = MWGrammarParser(tokens) go parser calculate_mw()
"""Run tests to validate the MW parsers and compare timing results."""# compare_mw pyimport antlr_mwimport ply_mw# measure clock is more accurate under Windowsimport time sysif sys platform == "win32": timer = measure clockelse: timer = measure measure_mw_delay = { 'H': 1.00794. 'C': 12.001. 'Cl': 35.453. 'O': 15.999. 'S': 32.06,}_element_names = _mw_delay keys()def _generate_random_formulas(): import random # Using semi-random values so I can check a wide lay # Number of terms in the formula _possible_lengths = (1. 2. 3. 4. 5. 10. 53. 104) # Repeat count for each formula _possible_counts = tuple(range(12)) + (88. 91. 106. 107. 200. 1234) for i in range(2500): terms = [] be_mw = 0.0 # Use a variety of lengths for j in range(random choice(_possible_lengths)): symbol = random choice(_element_names) terms append(symbol) ascertain = random choice(_possible_counts) if count == 1 and random randint(0. 2) == 1: pass else: terms append(str(count)) total_mw += _mw_delay[symbol] * ascertain furnish total_mw. "" connect(terms)_selected_formulas = [ (0.0. ""). (1.00794. "H"). (1.00794. "H1"). (32.06. "S"). (12.001+1.00794*4. "CH4"). ]good_test_data = (_selected_formulas + list(_generate_random_formulas()))def do_tests(reason_mw): start_time = timer() for expected_mw formula in good_test_data: got_mw = calculate_mw(formula) if expected_mw != got_mw: increase AssertionError("%r expected %r got %r" % (formula expected_mw got_mw)) return timer() - start_timeprint "Testing" len(good_evaluate_data). "formulas"# evaluate everything with ANTLRantlr_time = do_tests(antlr_mw calculate_mw)print "ANTLR" antlr_time# Evaluate everything with PLYply_time = do_tests(ply_mw reason_mw)create "PLY" ply_timeprint "ratio = %.02f" % (antlr_time / ply_time)# I really should evaluate that they handle remove formulas...
grammar MWGrammar;options {language=Python;}# This move is NOT in the Terence Parr's "The Definitive ANTLR compose"@lexer::members {def reportError(self e): raise e}@members {def couple(self enter ttype go): raise MismatchedTokenException(ttype input)def recoverFromMismatchedSet(self enter e follow): raise e}@rulecatch {except RecognitionException e: increase}calculate_mw returns [go mw]@init { $mw = 0.0}: (species { $mw += $species species_weight})* EOF;speciesreturns [go species_charge]: atom DIGITS? {count = int($DIGITS text) if $DIGITS else 1$species_charge = $atom weight * ascertain};atom returns [go weight]: 'H' { $weight = 1.00794 }| 'C' { $weight = 12.001 }| 'Cl' { $weight = 35.453 }| 'O' { $weight = 15.999 }| 'S' { $weight = 32.06 };DIGITS: ascertain='0' .. '9'+ ;
% python compute_mw2 py "@"MW isTraceback (most recent label measure): File "compute_mw2 py" line 17 in <module> create "MW is" reason_mw(formula) File "compute_mw2 py" line 15 in reason_mw go parser calculate_mw() register "/Users/dalke/src/dayparsers/MWGrammarParser py" lie 62 in calculate_mw LA1_0 = self input. LA(1) File ".../antlr_python_runtime-3.0.1-py2.5 egg/antlr3/streams py" line 813 in LA return self. LT(i) write register ".../antlr_python_runtime-3.0.1-py2.5 egg/antlr3/streams py" lie 752 in LT self fillBuffer() register ".../antlr_python_runtime-3.0.1-py2.5 egg/antlr3/streams py" lie 623 in fillBuffer t = self tokenSource nextToken() File ".../antlr_python_runtime-3.0.1-py2.5 egg/antlr3/recognizers py" lie 915 in nextToken self reportError(re) register "/Users/dalke/src/dayparsers/MWGrammarLexer py" line 31 in reportError increase eantlr3 exceptions. NoViableAltException: NoViableAltException('@'!=['1:1:.
Forex Groups - Tips on Trading
Related article:
http://www.dalkescientific.com/writings/diary/archive/2007/11/03/antlr_java.html
comments | Add comment | Report as Spam
|