Class Asciidoctor::Parser
In: lib/asciidoctor/parser.rb
Parent: Object

Public: Methods to parse lines of AsciiDoc into an object hierarchy representing the structure of the document. All methods are class methods and should be invoked from the Parser class. The main entry point is ::next_block. No Parser instances shall be discovered running around. (Any attempt to instantiate a Parser will be futile).

The object hierarchy created by the Parser consists of zero or more Section and Block objects. Section objects may be nested and a Section object contains zero or more Block objects. Block objects may be nested, but may only contain other Block objects. Block objects which represent lists may contain zero or more ListItem objects.

Examples

  # Create a Reader for the AsciiDoc lines and retrieve the next block from it.
  # Parser.next_block requires a parent, so we begin by instantiating an empty Document.

  doc = Document.new
  reader = Reader.new lines
  block = Parser.next_block(reader, doc)
  block.class
  # => Asciidoctor::Block

Methods

Constants

BlockMatchData = Struct.new :context, :masq, :tip, :terminator
TabRx = /\t/   Regexp for replacing tab character
TabIndentRx = /^\t+/   Regexp for leading tab indentation
StartOfBlockProc = lambda {|l| ((l.start_with? '[') && BlockAttributeLineRx =~ l) || (is_delimited_block? l) }
StartOfListProc = lambda {|l| AnyListRx =~ l }
StartOfBlockOrListProc = lambda {|l| (is_delimited_block? l) || ((l.start_with? '[') && BlockAttributeLineRx =~ l) || AnyListRx =~ l }
NoOp = nil

Public Class methods

Remove the block indentation (the leading whitespace equal to the amount of leading whitespace of the least indented line), then replace tabs with spaces (using proper tab expansion logic) and, finally, indent the lines by the amount specified.

This method preserves the relative indentation of the lines.

lines - the Array of String lines to process (no trailing endlines) indent - the integer number of spaces to add to the beginning

         of each line; if this value is nil, the existing
         space is preserved (optional, default: 0)

Examples

  source = <<EOS
      def names
        @name.split ' ')
      end
  EOS

  source.split "\n"
  # => ["    def names", "      @names.split ' '", "    end"]

  puts Parser.adjust_indentation!(source.split "\n") * "\n"
  # => def names
  # =>   @names.split ' '
  # => end

returns Nothing

whether a block supports complex content should be a config setting if terminator is false, that means the all the lines in the reader should be parsed NOTE could invoke filter in here, before and after parsing

Internal: Catalog any callouts found in the text, but don‘t process them

text - The String of text in which to look for callouts document - The current document on which the callouts are stored

Returns A Boolean indicating whether callouts were found

Internal: Catalog any inline anchors found in the text, but don‘t process them

text - The String text in which to look for inline anchors document - The current document on which the references are stored

Returns nothing

Internal: Initialize a new Section object and assign any attributes provided

The information for this section is retrieved by parsing the lines at the current position of the reader.

reader - the source reader parent - the parent Section or Document of this Section attributes - a Hash of attributes to assign to this section (default: {})

Public: Determines whether this line is the start of any of the delimited blocks

returns the match data if this line is the first line of a delimited block or nil if not

Internal: Convenience API for checking if the next line on the Reader is the document title

reader - the source Reader attributes - a Hash of attributes collected above the current line

returns true if the Reader is positioned at the document title, false otherwise

Internal: Checks if the next line on the Reader is a section title

reader - the source Reader attributes - a Hash of attributes collected above the current line

returns the section level if the Reader is positioned at a section title, false otherwise

Public: Checks if these lines are a section title

line1 - the first line as a String line2 - the second line as a String (default: nil)

returns the section level if these lines are a section title, false otherwise

Internal: Determine whether the this line is a sibling list item according to the list type and trait (marker) provided.

line - The String line to check list_type - The context of the list (:olist, :ulist, :colist, :dlist) sibling_trait - The String marker for the list or the Regexp to match a sibling

Returns a Boolean indicating whether this line is a sibling list item given the criteria provided

Public: Calculate the number of unicode characters in the line, excluding the endline

line - the String to calculate

returns the number of unicode characters in the line

Public: Make sure the Parser object doesn‘t get initialized.

Raises RuntimeError if this constructor is invoked.

Public: Return the next Section or Block object from the Reader.

Begins by skipping over blank lines to find the start of the next Section or Block. Processes each line of the reader in sequence until a Section or Block is found or the reader has no more lines.

Uses regular expressions from the Asciidoctor module to match Section and Block delimiters. The ensuing lines are then processed according to the type of content.

reader - The Reader from which to retrieve the next block parent - The Document, Section or Block to which the next block belongs

Returns a Section or Block object holding the parsed content of the processed lines

Internal: Parse and construct a labeled (e.g., definition) list Block from the current position of the Reader

reader - The Reader from which to retrieve the labeled list match - The Regexp match for the head of the list parent - The parent Block to which this labeled list belongs

Returns the Block encapsulating the parsed labeled list

Internal: Parse and construct the next ListItem for the current bulleted (unordered or ordered) list Block, callout lists included, or the next term ListItem and definition ListItem pair for the labeled list Block.

First collect and process all the lines that constitute the next list item for the parent list (according to its type). Next, parse those lines into blocks and associate them with the ListItem (in the case of a labeled list, the definition ListItem). Finally, fold the first block into the item‘s text attribute according to rules described in ListItem.

reader - The Reader from which to retrieve the next list item list_block - The parent list Block of this ListItem. Also provides access to the list type. match - The match Array which contains the marker and text (first-line) of the ListItem sibling_trait - The list marker or the Regexp to match a sibling item

Returns the next ListItem or ListItem pair (depending on the list type) for the parent list Block.

Internal: Parse and construct an outline list Block from the current position of the Reader

reader - The Reader from which to retrieve the outline list list_type - A Symbol representing the list type (:olist for ordered, :ulist for unordered) parent - The parent Block to which this outline list belongs

Returns the Block encapsulating the parsed outline (unordered or ordered) list

Public: Return the next section from the Reader.

This method process block metadata, content and subsections for this section and returns the Section object and any orphaned attributes.

If the parent is a Document and has a header (document title), then this method will put any non-section blocks at the start of document into a preamble Block. If there are no such blocks, the preamble is dropped.

Since we are reading line-by-line, there‘s a chance that metadata that should be associated with the following block gets consumed. To deal with this case, the method returns a running Hash of "orphaned" attributes that get passed to the next Section or Block.

reader - the source Reader parent - the parent Section or Document of this new section attributes - a Hash of metadata that was left orphaned from the

             previous Section.

Examples

  source
  # => "= Greetings\n\nThis is my doc.\n\n== Salutations\n\nIt is awesome."

  reader = Reader.new source, nil, :normalize => true
  # create empty document to parent the section
  # and hold attributes extracted from header
  doc = Document.new

  Parser.next_section(reader, doc).first.title
  # => "Greetings"

  Parser.next_section(reader, doc).first.title
  # => "Salutations"

returns a two-element Array containing the Section and Hash of orphaned attributes

Internal: Parse the table contained in the provided Reader

table_reader - a Reader containing the source lines of an AsciiDoc table parent - the parent Block of this Asciidoctor::Table attributes - attributes captured from above this Block

returns an instance of Asciidoctor::Table parsed from the provided reader

Public: Parses AsciiDoc source read from the Reader into the Document

This method is the main entry-point into the Parser when parsing a full document. It first looks for and, if found, processes the document title. It then proceeds to iterate through the lines in the Reader, parsing the document into nested Sections and Blocks.

reader - the Reader holding the source lines of the document document - the empty Document into which the lines will be parsed options - a Hash of options to control processing

returns the Document object

Internal: Parse the next line if it contains metadata for the following block

This method handles lines with the following content:

  • line or block comment
  • anchor
  • attribute list
  • block title

Any attributes found will be inserted into the attributes argument. If the line contains block metadata, the method returns true, otherwise false.

reader - the source reader parent - the parent of the current line attributes - a Hash of attributes in which any metadata found will be stored options - a Hash of options to control processing: (default: {})

             *  :text indicates that lexer is only looking for text content
                  and thus the block title should not be captured

returns true if the line contains metadata, otherwise false

Internal: Parse lines of metadata until a line of metadata is not found.

This method processes sequential lines containing block metadata, ignoring blank lines and comments.

reader - the source reader parent - the parent to which the lines belong attributes - a Hash of attributes in which any metadata found will be stored (default: {}) options - a Hash of options to control processing: (default: {})

             *  :text indicates that lexer is only looking for text content
                  and thus the block title should not be captured

returns the Hash of attributes including any metadata found

Public: Parse blocks from this reader until there are no more lines.

This method calls Parser#next_block until there are no more lines in the Reader. It does not consider sections because it‘s assumed the Reader only has lines which are within a delimited block region.

reader - The Reader containing the lines to process parent - The parent Block to which to attach the parsed blocks

Returns nothing.

Internal: Parse the cell specs for the current cell.

The cell specs dictate the cell‘s alignments, styles or filters, colspan, rowspan and/or repeating content.

The default spec when pos == :end is {} since we already know we‘re at a delimiter. When pos == :start, we may be at a delimiter, nil indicates we‘re not.

returns the Hash of attributes that indicate how to layout and style this cell in the table.

Internal: Parse the column specs for this table.

The column specs dictate the number of columns, relative width of columns, default alignments for cells in each column, and/or default styles or filters applied to the cells in the column.

Every column spec is guaranteed to have a width

returns a Hash of attributes that specify how to format and layout the cells in the table.

Public: Parses the document header of the AsciiDoc source read from the Reader

Reads the AsciiDoc source from the Reader until the end of the document header is reached. The Document object is populated with information from the header (document title, document attributes, etc). The document attributes are then saved to establish a save point to which to rollback after parsing is complete.

This method assumes that there are no blank lines at the start of the document, which are automatically removed by the reader.

returns the Hash of orphan block attributes captured above the header

Public: Consume and parse the two header lines (line 1 = author info, line 2 = revision info).

Returns the Hash of header metadata. If a Document object is supplied, the metadata is applied directly to the attributes of the Document.

reader - the Reader holding the source lines of the document document - the Document we are building (default: nil)

Examples

 data = ["Author Name <author@example.org>\n", "v1.0, 2012-12-21: Coincide w/ end of world.\n"]
 parse_header_metadata(Reader.new data, nil, :normalize => true)
 # => {'author' => 'Author Name', 'firstname' => 'Author', 'lastname' => 'Name', 'email' => 'author@example.org',
 #       'revnumber' => '1.0', 'revdate' => '2012-12-21', 'revremark' => 'Coincide w/ end of world.'}

Public: Parses the manpage header of the AsciiDoc source read from the Reader

returns Nothing

Internal: Parse the section title from the current position of the reader

Parse a single or double-line section title. After this method is called, the Reader will be positioned at the line after the section title.

reader - the source reader, positioned at a section title document- the current document

Examples

  reader.lines
  # => ["Foo", "~~~"]

  id, reftext, title, level, single = parse_section_title(reader, document)

  title
  # => "Foo"
  level
  # => 2
  id
  # => nil
  single
  # => false

  line1
  # => "==== Foo"

  id, reftext, title, level, single = parse_section_title(reader, document)

  title
  # => "Foo"
  level
  # => 3
  id
  # => nil
  single
  # => true

returns an Array of [String, String, Integer, String, Boolean], representing the id, reftext, title, level and line count of the Section, or nil.

Public: Parse the first positional attribute and assign named attributes

Parse the first positional attribute to extract the style, role and id parts, assign the values to their cooresponding attribute keys and return both the original style attribute and the parsed value from the first positional attribute.

attributes - The Hash of attributes to process and update

Examples

  puts attributes
  => {1 => "abstract#intro.lead%fragment", "style" => "preamble"}

  parse_style_attribute(attributes)
  => ["abstract", "preamble"]

  puts attributes
  => {1 => "abstract#intro.lead", "style" => "abstract", "id" => "intro",
        "role" => "lead", "options" => ["fragment"], "fragment-option" => ''}

Returns a two-element Array of the parsed style from the first positional attribute and the original style that was replaced

Internal: Parse the author line into a Hash of author metadata

author_line - the String author line names_only - a Boolean flag that indicates whether to process line as

               names only or names with emails (default: false)

multiple - a Boolean flag that indicates whether to process multiple

               semicolon-separated entries in the author line (default: true)

returns a Hash of author metadata

Internal: Collect the lines belonging to the current list item, navigating through all the rules that determine what comprises a list item.

Grab lines until a sibling list item is found, or the block is broken by a terminator (such as a line comment). Definition lists are more greedy if they don‘t have optional inline item text...they want that text

reader - The Reader from which to retrieve the lines. list_type - The Symbol context of the list (:ulist, :olist, :colist or :dlist) sibling_trait - A Regexp that matches a sibling of this list item or String list marker

                  of the items in this list (default: nil)

has_text - Whether the list item has text defined inline (always true except for labeled lists)

Returns an Array of lines belonging to the current list item.

Internal: Resolve the 0-index marker for this list item

For ordered lists, match the marker used for this list item against the known list markers and determine which marker is the first (0-index) marker in its number series.

For callout lists, return <1>.

For bulleted lists, return the marker as passed to this method.

list_type - The Symbol context of the list marker - The String marker for this list item ordinal - The position of this list item in the list validate - Whether to validate the value of the marker

Returns the String 0-index marker for this list item

Internal: Resolve the 0-index marker for this ordered list item

Match the marker used for this ordered list item against the known ordered list markers and determine which marker is the first (0-index) marker in its number series.

The purpose of this method is to normalize the implicit numbered markers so that they can be compared against other list items.

marker - The marker used for this list item ordinal - The 0-based index of the list item (default: 0) validate - Perform validation that the marker provided is the proper

           marker in the sequence (default: false)

Examples

 marker = 'B.'
 Parser.resolve_ordered_list_marker(marker, 1, true)
 # => 'A.'

Returns the String of the first marker in this number series

Internal: Converts a Roman numeral to an integer value.

value - The String Roman numeral to convert

Returns the Integer for this Roman numeral

Public: Convert a string to a legal attribute name.

name - the String name of the attribute

Returns a String with the legal AsciiDoc attribute name.

Examples

  sanitize_attribute_name('Foo Bar')
  => 'foobar'

  sanitize_attribute_name('foo')
  => 'foo'

  sanitize_attribute_name('Foo 3 #-Billy')
  => 'foo3-billy'

Private: Get the Integer section level based on the characters used in the ASCII line under the section title.

line - the String line from under the section title.

Public: Store the attribute in the document and register attribute entry if accessible

name - the String name of the attribute to store value - the String value of the attribute to store doc - the Document being parsed attrs - the attributes for the current context

returns a 2-element array containing the attribute name and value

[Validate]