class Nokogiri::HTML5::Document
Since v1.12.0
💡 HTML5 functionality is not available when running JRuby.
Attributes
Get the parser’s quirks mode value. See HTML5::QuirksMode.
This method returns ‘nil` if the parser was not invoked (e.g., `Nokogiri::HTML5::Document.new`).
Since v1.14.0
Get the url name for this document, as passed into Document.parse, Document.read_io, or Document.read_memory
Public Class Methods
Parse HTML5 input.
- Parameters
-
inputmay be a String, or any object that responds to read and close such as an IO, or StringIO. -
url(optional) is a String indicating the canonical URI where this document is located. -
encoding(optional) is the encoding that should be used when processing the document. -
options(optional) is a configuration Hash (or keyword arguments) to set options during parsing. The three currently supported options are:max_errors,:max_tree_depthand:max_attributes, described atNokogiri::HTML5.âš Note that these options are different than those made available by
Nokogiri::XML::DocumentandNokogiri::HTML4::Document. -
block(optional) is passed a configuration Hash on which parse options may be set. SeeNokogiri::HTML5for more information and usage.
- Returns
# File lib/nokogiri/html5/document.rb, line 80 def parse(string_or_io, url = nil, encoding = nil, **options, &block) yield options if block string_or_io = "" unless string_or_io if string_or_io.respond_to?(:encoding) && string_or_io.encoding != Encoding::ASCII_8BIT encoding ||= string_or_io.encoding.name end if string_or_io.respond_to?(:read) && string_or_io.respond_to?(:path) url ||= string_or_io.path end unless string_or_io.respond_to?(:read) || string_or_io.respond_to?(:to_str) raise ArgumentError, "not a string or IO object" end do_parse(string_or_io, url, encoding, options) end
Create a new document from an IO object.
💡 Most users should prefer Document.parse to this method.
# File lib/nokogiri/html5/document.rb, line 101 def read_io(io, url = nil, encoding = nil, **options) raise ArgumentError, "io object doesn't respond to :read" unless io.respond_to?(:read) do_parse(io, url, encoding, options) end
Create a new document from a String.
💡 Most users should prefer Document.parse to this method.
# File lib/nokogiri/html5/document.rb, line 110 def read_memory(string, url = nil, encoding = nil, **options) raise ArgumentError, "string object doesn't respond to :to_str" unless string.respond_to?(:to_str) do_parse(string, url, encoding, options) end
Private Class Methods
# File lib/nokogiri/html5/document.rb, line 118 def do_parse(string_or_io, url, encoding, options) string = HTML5.read_and_encode(string_or_io, encoding) max_attributes = options[:max_attributes] || Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES max_errors = options[:max_errors] || options[:max_parse_errors] || Nokogiri::Gumbo::DEFAULT_MAX_ERRORS max_depth = options[:max_tree_depth] || Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH doc = Nokogiri::Gumbo.parse(string, url, max_attributes, max_errors, max_depth, self) doc.encoding = "UTF-8" doc end
Public Instance Methods
Parse a HTML5 document fragment from markup, returning a Nokogiri::HTML5::DocumentFragment.
- Properties
-
markup(String) TheHTML5markup fragment to be parsed
- Returns
-
Nokogiri::HTML5::DocumentFragment. This object’s children will be empty if ‘markup` is not passed, is empty, or is `nil`.
# File lib/nokogiri/html5/document.rb, line 147 def fragment(markup = nil) DocumentFragment.new(self, markup) end
- Returns
-
The document type which determines CSS-to-XPath translation.
See CSS::XPathVisitor for more information.
# File lib/nokogiri/html5/document.rb, line 163 def xpath_doctype Nokogiri::CSS::XPathVisitor::DoctypeConfig::HTML5 end