Speaking in code
November 2008
Having dealt with many different markup lanugages used in user-submitted content over the years as both a web user and a web developer, and currently working on a project or two which will require a markup language in their forum posts and comments, I think it might be a good whinge discussion to find out which of the major markup code systems is best for your users and also for your developers.
BBCode
BBCode is primarily used in bulletin board (aka forum) software to allow people to format and style their text using easy-to-remember style codes without having to know pure HTML. It involves using a parsing engine which replaces specific codes with HTML upon submission or, in most software, when displaying the post. Generally, the code is made up of square brackets surrounding a particular set of letters. For example:
[b]Something[/b] is grinding my [u]gears[/u]. [url=http://www.google.com]Click here to see.[/url]
This block of text above, once run through the BBCode parser, will become:
<strong>Something</strong> is grinding my <span style="text-decoration: underline;">gears</span>. <a href="http://www.google.com">Click here to see.</a>
Whilst being easy for your users to remember, it does require your developers to code a very robust BBCode parser, firstly to make sure it generates valid HTML code and secondly, to make sure people don’t use the BBCode in ways it is not intended to.
Another problem with the BBCode is that there is no standardization in the language, different software accept different BBCode combinations. This makes it hard for users to be able to learn one BBCode language and then move to another software without having to re-learn everything again. This can cause confusion, especially for regular forum or comment posters across a wide variety of sites, as it is very possible that if you were to post the same message on each site, you would have to do some small edits to the formatting code in order for it to display correctly.
For some examples of software which use BBCode for posting, check out phpBB or vBulletin.
Wiki Code
Wiki Code is usually used in conveniently-named Wiki software, like MediaWiki, the software which powers the 3rd God of the Internet, Wikipedia. Wiki Code generally uses symbols to do markup, in a way which is supposed to emulate actual writing as closely as possible. For example:
''Italicized text'' is not as noticable as '''bolded text'''. ___Underlined text___ is readable whereas ---striken text--- is not supposed to be. This is a [http://www.google.com link].
When run through the Wiki Code parser, this should look like:
<em>Italicized text</em> is not as noticable as <strong>bolded text</strong>. <span style="text-decoration: underline;">Underlined text</span> is readable whereas <span class="text-decoration: strikethrough;"> striken text</span> is not supposed to be. This is a <a href="http://www.google.com">link</a>.
So what exactly is wrong with this? Well, once again, there is no standardized language for Wiki Code, it’s all up in arms which one is the one to use. MediaWiki would arguably be the one most people use for Wiki software, but without a standard to go by, the formatting could change depending on which site you use.
Another, more glaring, problem with Wiki Code is the likelihood of false positives in the Wiki Code parser. You see, because Wiki Code is marked up using symbols and punctuation used in normal sentences without any sort of delimiter (like the square brackets used in BBCode), it would be very easy to make a mistake by using punctuation even in normal cases and have the Wiki Code parser change the formatting around where you don’t want it to. Admittedly, the most widely used Wiki Code parser, MediaWiki, is quite good at intercepting and dodging problems, but even it has problems with some usage. For example, if you wished to use square brackets inside sample PHP code to indicate arrays, like so:
$var['name'] = 'Foo';
You would find that the Wiki Code parser would inadvertantly intercept the array declaration as a link and render it thusly:
$var<a href="http://'name'">'name'</a> = 'Foo';
Whilst nearly all Wiki Code parsers out there allow you to set the parser to ignore certain areas, usually through the use of a <nowiki> tag, it does present the issue of having hard-to-read source code as it is littered with <nowiki> tags.
HTML
My personal choice is just allowing visitors to use straight up HTML. “Why?”, I hear you ask. The abundance of documentation and examples on the web for HTML would be one reason. Presentable, easy-to-read code for a second. No need for any PHP execution as parsing is done by the browser, not by the backend for a third reason. But the main reason is that it is standardized. So you would know that any two sites allowing HTML as user-posted content should, by definition, accept the same posted content on both without any changes needing to be made. Correct, in theory, but there can be a few caveats which I will explain shortly.
As a developer, allowing users to post in straight HTML is a no-brainer. You can easily just display any posted content stored in a database with so much as a PHP echo call. It also saves the developer having to use, or develop, a parsing engine in PHP to convert the posted content into HTML for display as the posted content is already in HTML.
But there are things to watch out for. For example, having no security checks in place when people submit code could open the floodgates to people posting all sorts of mischevious behavior, like Javascript code, flashing text etc or even enter in styling code and completely change the design of your website. At the very least, you would want to implement some sort of system to intercept and remove or nullify any HTML which could cause issues with your site. The PHP strip_tags function allows you to specify tags that you wish for it to ignore. However, it doesn’t stop another problem, and that is tag attributes. Especially for hyperlinks, the ‘onclick’ event could be used to execute Javascript code, even if you have disabled the <script> tag. There are plenty of functions on the net that allow you to filter HTML as well as attributes, allowing one to specify which attributes you wish to use per tag (I will post up an example of one of these functions in another post).
HTML can be daunting for the new user, especially one who has not had extensive experience with the Internet. It can put some people off, but many sites which allow HTML posted content usually provide some documentation or buttons which ease the task. For example, a Bold button could put <strong> tags around the currently selected text.
As long as you understand the risks with HTML and know how to code it right, and provide documentation for novices, you will find that allowing straight HTML posting is the most widely-known and effective of markup languages available for user content.

rss
Leave a Reply