Simple examples using Closure HTML.
- Parsing a string
- Parsing a file
- Cleaning up broken HTML
- Translating an HTML file to XHTML
- Translating an XHTML file to HTML
- Fetching and parsing Google search results
Parsing a string
Parse into LHTML:
* (chtml:parse "<p>nada</p>" (chtml:make-lhtml-builder))
=> (:HTML NIL (:HEAD NIL) (:BODY NIL (:P NIL "nada")))
Serialize LHTML back into a string:
* (chtml:serialize-lhtml * (chtml:make-string-sink))
=> "<HTML><HEAD></HEAD><BODY><P>nada</P></BODY></HTML>"
Parsing a file
Note that the filename must be passed as a pathname (written using #p), not just a string, because a string would be interpreted as a literal HTML document as in the first example above.
* (chtml:parse #p"example.html" (chtml:make-lhtml-builder))
=> (:HTML NIL (:HEAD NIL) (:BODY NIL (:P NIL "nada")))
Cleaning up broken HTML
Many HTML syntax errors are corrected by Closure HTML automatically. In this example, we parse from a string and serialize it back immediately.
* (defun clean-html (string) (chtml:parse string (chtml:make-string-sink)))
=> CLEAN-HTML
Note the differences between input and output in the following document:
- <title> is moved into <head>.
- The bogus attribute is removed.
- <br is corrected to <br> and </oops> to </p>.
* (clean-html "<title>cleanup example</title> <p bogus> <br </oops>")
=> "<HTML><HEAD><TITLE>cleanup example</TITLE></HEAD><BODY><P> <BR></P></BODY></HTML>"
Translating an HTML file to XHTML
In this example, we parse an HTML file and serialize it into XHTML.
This example uses Closure XML.
* (defun html2xhtml (file &key (if-exists :error)) (with-open-file (out (make-pathname :type "xml" :defaults file) :element-type '(unsigned-byte 8) :if-exists if-exists :direction :output) (chtml:parse (pathname file) (cxml:make-octet-stream-sink out))))
=> HTML2XHTMLUse like this:
* (html2xhtml "/home/david/test.html" :if-exists :supersede)
The following input file and its XHTML version illustrate some of the differences between the two syntaxes.
test.html:
<p>foo</p> <br> <br> <br> <select> <option selected>123 <option>456 </select>
test.xml:
<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body><p>foo</p> <br/> <br/> <br/> <select><option selected="selected">123 </option><option>456 </option></select> </body></html>
Translating an XHTML file to HTML
This is a continuation of the opposite example above. In that example, we converted an HTML file to HTML. Going back to HTML is just as easy:
* (defun xhtml2html (file &key (if-exists :error)) (with-open-file (out (make-pathname :type "html" :defaults file) :element-type '(unsigned-byte 8) :if-exists if-exists :direction :output) (cxml:parse (pathname file) (chtml:make-octet-stream-sink out))))
=> XHTML2HTMLRunning this function on the example above results in a clean-up version of the original document:
test.html:
<html><head></head><body><p>foo</p> <br> <br> <br> <select><option selected>123 </option><option>456 </option></select> </body></html>
Fetching and parsing Google search results
In this example, we perform a google search and print the first ten results by looking for all links of the form <a class="l">.
This example uses Drakma to perform the HTTP request, and the DOM alternative cxml-stp.
* (defun show-google-hits (term) (let* ((query (list (cons "q" term))) (str (drakma:http-request "http://www.google.com/search" :parameters query)) (document (chtml:parse str (cxml-stp:make-builder)))) (stp:do-recursively (a document) (when (and (typep a 'stp:element) (equal (stp:local-name a) "a") (equal (stp:attribute-value a "class") "l")) (format t "~A:~% ~A~%~%" (stp:string-value a) (stp:attribute-value a "href"))))))
=> SHOW-GOOGLE-HITSSearching for "lisp" we get these results:
* (show-google-hits "lisp")
=> Lisp (programming language) - Wikipedia, the free encyclopedia: http://en.wikipedia.org/wiki/Lisp_programming_language Lisp - Wikipedia, the free encyclopedia: http://en.wikipedia.org/wiki/Lisp Association of Lisp Users: http://www.lisp.org/ An Introduction and Tutorial for Common Lisp: http://www.apl.jhu.edu/~hall/lisp.html Lisp: http://www.paulgraham.com/lisp.html The Roots of Lisp: http://www.paulgraham.com/rootsoflisp.html Planet Lisp: http://planet.lisp.org/ Practical Common Lisp: http://www.gigamonkeys.com/book/ CLISP - an ANSI Common Lisp Implementation: http://clisp.cons.org/ Lisp FAQ: http://www.cs.cmu.edu/Groups/AI/html/faqs/lang/lisp/top.html