2015-03-06 11:22:23 +13:00
defmodule LexLuthor do
2017-04-27 14:55:25 +12:00
alias LexLuthor.Runner
2015-03-06 11:22:23 +13:00
2017-04-27 14:55:25 +12:00
@moduledoc """
LexLuthor is a Lexer in Elixir ( say that 10 times fast ) which uses macros to generate a reusable lexers . Good times .
LexLuthor is a state based lexer , meaning that it keeps a state stack which you can push states on and pop states off the stack , which are used to filter the applicable rules for a given state . For example :
iex > defmodule StringLexer do
... > use LexLuthor
... > defrule ~r/ ^' / , fn ( _ ) -> :STRING end
... > defrule ~r/ ^[^']+ / , :STRING , fn ( e ) -> { :string , e } end
... > defrule ~r/ ^' / , :STRING , fn ( _ ) -> nil end
... > end
... > StringLexer . lex ( " 'foo' " )
{ :ok , [ % LexLuthor.Token { column : 1 , line : 1 , name : :string , pos : 1 , value : " foo " } ] }
Rules are defined by a regular expression , an optional state ( as an atom ) and an action in the form of an anonymous function .
When passed the string ` ' foo ' ` , the lexer starts in the ` :default ` state , so it filters for rules in the default state ( the first rule , as it doesn ' t specify a state), then it filters the available rules by the longest matching regular expression. In this case, since we have only one rule (which happens to match) it ' s automatically the longest match .
Once the longest match is found , then it ' s action is executed and the return value matched:
- If the return value is a single atom then that atom is assumed to be a state and push onto the top of the state stack .
- If the return value is a two element tuple then the first element is expected to be an atom ( the token name ) and the second element a value for this token .
- If the return value is ` nil ` then the top state is popped off the state stack .
If lexing succeeds then you will receive an ` :ok ` tuple with the second value being a list of ` LexLuthor.Token ` structs .
If lexing fails then you will receive an ` :error ` tuple which a reason and position .
"""
2015-03-06 11:22:23 +13:00
defmacro __using__ ( _opts ) do
quote do
@rules [ ]
2015-03-31 10:30:26 +13:00
@action_counter 0
2015-03-06 11:22:23 +13:00
import LexLuthor
@before_compile LexLuthor
end
end
defmacro __before_compile__ ( _env ) do
quote do
2021-01-27 10:20:18 +13:00
def lex ( string ) do
Runner . lex ( __MODULE__ , @rules , string )
2015-03-06 11:22:23 +13:00
end
end
end
2017-04-27 14:55:25 +12:00
@doc """
Define a lexing rule for a specific state .
- ` regex ` a regular expression for matching against the input string .
- ` state ` the lexer state in which this rule applies .
- ` action ` the function to execute when this rule is applied .
"""
2021-01-27 10:20:18 +13:00
@spec defrule ( Regex . t ( ) , atom , ( String . t ( ) -> atom | nil | { atom , any } ) ) ::
{ :ok , non_neg_integer }
2017-04-27 14:55:25 +12:00
defmacro defrule ( regex , state , action ) do
2015-03-06 11:22:23 +13:00
quote do
2021-01-27 10:20:18 +13:00
@action_counter @action_counter + 1
action_name = " _action_ #{ @action_counter } " |> String . to_atom ( )
action = unquote ( Macro . escape ( action ) )
defaction =
quote do
def unquote ( Macro . escape ( action_name ) ) ( e ) do
unquote ( action ) . ( e )
end
2015-03-31 10:30:26 +13:00
end
2015-03-06 11:22:23 +13:00
2021-01-27 10:20:18 +13:00
Module . eval_quoted ( __MODULE__ , defaction )
@rules @rules ++ [ { unquote ( state ) , unquote ( regex ) , action_name } ]
2017-04-27 14:55:25 +12:00
{ :ok , Enum . count ( @rules ) }
2015-03-06 11:22:23 +13:00
end
end
2017-04-27 14:55:25 +12:00
@doc """
Define a lexing rule applicable to the default state .
2015-03-06 11:22:23 +13:00
2017-04-27 14:55:25 +12:00
- ` regex ` a regular expression for matching against the input string .
- ` action ` the function to execute when this rule is applied .
"""
defmacro defrule ( regex , action ) do
quote do
2021-01-27 10:20:18 +13:00
defrule ( unquote ( regex ) , :default , unquote ( action ) )
2015-03-06 11:22:23 +13:00
end
end
end