A great tool for accomplishing just this is the open source program HTMLDOC
(http://www.htmldoc.org/), which converts HTML documents to indexed HTML,
Adobe PostScript, and PDF files. HTMLDOC can be invoked from the command line,
like so:
%>htmldoc --webpage ??“f webpage.pdf http://www.wjgilmore.com/
This would result in the creation of a PDF named webpage.pdf, which would
contain a snapshot of the Web site??™s index page. Of course, most users will not have
command-line access to your server; therefore, you??™ll need to create a much more
controlled interface, such as a Web page. Using PHP??™s passthru() function (introduced
in the later section ???PHP??™s Program Execution Functions???), you can call HTMLDOC and
return the desired PDF, like so:
304 CHAPTER 10 ?– WORKING WITH THE FI LE AND OPERATING SYSTEM
$document = $_POST['userurl'];
passthru("htmldoc --webpage -f webpage.pdf $document);
What if an enterprising attacker took the liberty of passing through additional
input, unrelated to the desired HTML page, entering something like this:
http://www.wjgilmore.com/ ; cd /usr/local/apache/htdocs/; rm ??“rf *
Most Unix shells would interpret the passthru() request as three separate
commands. The first is this:
htmldoc --webpage -f webpage.pdf http://www.wjgilmore.com/
The second command is this:
cd /usr/local/apache/htdocs/
And the final command is this:
rm -rf *
The last two commands are certainly unexpected and could result in the deletion
of your entire Web document tree.
Pages:
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378