It is currently Mon Mar 18, 2024 8:24 pm



Reply to topic  [ 12 posts ] 
c# version of nl2br that is "html safe" 
Author Message
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post c# version of nl2br that is "html safe"
Basically, c# lacks an nl2br function. Simple enough to remedy.

Code:
    protected string nl2br(string input)
    {
        string output = input.Replace("\n", "<br />");
        return output;
    }


*note I didn't use Environment.Newline because in Windows that returns both \n and \r and can result in two <br /> tags per physical newline.

However, I then found it necessary to support HTML tags. For properly-formatted HTML tags like styles, tables or lists all the extra newlines would cause a huge number of superfluous <br tags that would all get stuck at the top of the element in a display.

I had a hack in pace for awhile but then came up with this. There are obviously limitations. However, it even handles nested tags.

Code:
protected string nl2br_htmlsafe(string input)
{
    string returnstring = null; string addnewline = null; int lastLineNull = 0;
    ArrayList tagArray = new ArrayList();

    string[] working = input.Split("\r".ToCharArray()); //break input on \r
    working = (working.Length == 1) ? input.Split("\n".ToCharArray()) : working;

    string[,] htmlExempt = new string[,] {
        {"<table", "</table" },
        {"<style", "</style" },
        {"<ul", "</ul" },
        {"<ol", "</ol"}
    }; //create an array of opening and closing tags to look for

    foreach (string line in working)    //iterate through each line
    {
        for (int i = 0; i < htmlExempt.Length / 2; i++)
        {
            if (line.Contains(htmlExempt[i, 0]))
            {
                tagArray.Add(htmlExempt[i, 0]);  //if an opening tag is found, add the tag to the arrayList
            }
        }
        if (tagArray.Count > 0)
        {
            /*
            Check to see if the closing tag for the last tag opened exists.  If so, remove that tag from
            the arrayList. Loops through all the tags in case more than one tag is closed.  This doesn't
            take into account the order in which the tags are closed.               
            */
            for (int i = 0; i < tagArray.Count; i++)
            {
                if (line.Contains(tagArray[(tagArray.Count - 1)].ToString()))
                {
                    tagArray.RemoveAt(tagArray.Count - 1);
                }
            }
        }
        lastLineNull = (line.Length == 0) ? lastLineNull + 1 : 0;  //how many empty lines we have in a row
        //add a <br /> if we're not in a tag and we have no more than 2 consecutive newlines
        returnstring += (tagArray.Count > 0 && lastLineNull < 2) ?
            line + "<br />" + Environment.NewLine :
            line + Environment.NewLine;
    }
    return returnstring;
}

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Mon Dec 24, 2007 8:06 am
Profile WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
There were several problems in that one that I've since identified. Ooops. Oh well. Fixed version. I also adding some debugging. Flip boolean show to true for debug output.

Code:
    protected string nl2br_htmlsafe(string input)
    {
        bool show = false;
        string returnstring = null; string addnewline = null; int lastLineNull = 0;
        ArrayList tagArray = new ArrayList();

        string[] working = input.Split("\r".ToCharArray()); //break input on \r
        working = (working.Length == 1) ? input.Split("\n".ToCharArray()) : working;

        string[,] htmlExempt = new string[,] {
            {"<table", "</table" },
            {"<style", "</style" },
            {"<ul", "</ul" },
            {"<ol", "</ol"}
        }; //create an array of opening and closing tags to look for

        foreach (string line in working)    //iterate through each line
        {
            for (int i = 0; i < htmlExempt.Length / 2; i++)
            {
                if (line.Contains(htmlExempt[i, 0]))
                {
                    tagArray.Add(htmlExempt[i, 1]);  //if an opening tag is found, add the tag to the arrayList
                    if (show) { Response.Write("found tag :: " + htmlExempt[i, 1] + "<br>"); }
                }
            }
            if (tagArray.Count > 0)
            {
                /*
                Check to see if the closing tag for the last tag opened exists.  If so, remove that tag from
                the arrayList. Loops through all the tags in case more than one tag is closed.  This doesn't
                take into account the order in which the tags are closed.               
                */
                for (int i = 0; i < tagArray.Count; i++)
                {
                    if (line.Contains(tagArray[(tagArray.Count - 1)].ToString()))
                    {
                        if (show) { Response.Write("removed tag :: " + tagArray[(tagArray.Count - 1)].ToString() + "<br>"); }
                        tagArray.RemoveAt(tagArray.Count - 1);
                    }
                }
            }
            //how many empty lines we have in a row
            lastLineNull = (line.Length == 0) ? lastLineNull + 1 : 0; 
            //add a <br /> if we're not in a tag and we have no more than 2 consecutive newlines
            if (show) { Response.Write(line + " :: " + tagArray.Count + " :: " + lastLineNull + "<br>"); } 
            returnstring += (tagArray.Count == 0 && lastLineNull < 2) ?
                line + "<br />" + Environment.NewLine :
                line + Environment.NewLine;
        }
        return returnstring;
    }

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Fri Dec 28, 2007 7:37 am
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
Satis wrote:
\n and \r

Later I might look into what here happens. Now I got weird feeling it is first CR (13) and then LF (10) that Windows uses to show us a new line off (and not LF + CR as above). You probably just sorted them by lexical values. :P Actually, you could probably ignore (not remove, just ignore) all CRs in text and then just replace each LF with <br />. If there's no CRs, there will be nothing to ignore. Works in all cases.

Unless... you use CRs somewhere on other purpose (which is probably not the case), in that case you'll ignore CR only if it is in front of LF which would still be dangerous, but you know better what can occur in the "text" you want to convert.

_________________
++


Thu Jan 03, 2008 7:13 am
Profile WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
yea, the \r vs \n was just what I happened to pick. My O/S of choice in this case is Windows so they're both present. I could've chosen \n and made the whole thing more cross-O/S friendly. Ah well.

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Thu Jan 03, 2008 6:48 pm
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
Ok, so what you're doing is inserting <br /> instead of '\n', thereby ignoring more than one newline in a row and doing nothing if some of named tags is active? Unfortunately I don't have C# here (nor I want it), but I suspect some things don't work as you wanted them to:

- What happens if there is more than one tag of the same sort in a line?
- Are you sure it does what you want when there's three or more empty new lines in a row?

On the other side, there might be some optimizations possible:

- Do you really need an ArrayList to watch which tags are still opened? It's slow to search, it's somewhat redundant to fill and compare the elements. Since you don't control whether the nesting was strict or not, you can make a simple array of 4 integers and increment/decrement its elements to mark in which quantity something has been opened / remained unclosed.

- In C/C++ you can debug by defining macros.
Code:
#ifdef DEBUG_MODE
// some debugging output
#endif

Advantage of this is that "some debugging output" does not execute with the code when DEBUG_MODE is not defined and thereby is sparing your executive version few needless "if( DEBUG_MODE )". It would as well save you time, if you wanted to remove this debug code after you finish, because you don't have to do it (just undefine the DEBUG_MODE macro). In the end, if you ever came back to the code, you could use this debug mode again, without having to implement it again.

_________________
++


Sat Jan 05, 2008 1:44 am
Profile WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
For your questions (in reverse order)...

debug mode.... I actually set the debug variable as part of my main class as a property. So I just go up there, set true to false and all my debug output stops. So swapping it is easy. Regarding the debug markers like in c / c++, I'm not sure if c# supports that. I wouldn't be surprised if it does, there are so many parts of c# I don't know about yet.

As for run time, whenever you make a change to the source files, the first time it's run it gets compiled into some sort of byte code or something by asp.net. So the first time it's SLOW. After that it runs the pre-compiled code and execution is really fast. So though all the debug statements may slow down the initial compile time, I doubt it affects runtime afterward by much.

For the arraylist.... unfortunately c# doesn't allow you to dynamically grow an array. IE, when you declare the array you have to declare the size. Every time I hit a new tag to add to the array (or hit a tag to remove from it) I could read the array size, build a new array with the new size, then populate it with the old array's values... but that's pretty inefficient. The ArrayList allows itself to dynamically grow and shrink, hence why I chose it.

Lastly, regarding the adding and removing of multiple tags in a line, I kind of handle that. For removing tags, I don't check their order to make sure they're closed out properly. I basically check the last tag to be added to arraylist, see if the closing tag exists in the current line and if it does I remove the last tag in the arraylist. If the arraylist still has tags in it, I then check the line again, etc etc.

It's not quite "right" but I can't think of a situation where it would cause a problem.

Anyway, thanks for your interest. :p The more I work with c# the more I like parts of it... and the more I hate other parts of it. I'd love to get a nice fusion of the parts of c# and php that I like for a hybrid language that seriously rocks. :twisted:

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Sat Jan 05, 2008 10:55 am
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
My point about the array list was that you don't even have to use it. Lemme quote myself
GFreeman wrote:
array of 4 integers

So, take the point. You do not need to expand that array but to increment/decrement its elements.

My point about the debug mod was that, unless C# got a really smart optimizer, every C#'s if( something ) will cost you more than C++'s #ifdef something. But yeah, it might be not possible in C#, like it is not in Java.

Satis wrote:
It's not quite "right" but I can't think of a situation where it would cause a problem.

Well, how about briefly stating what should it do? I think I gave you few examples that are worth checking, but yet I can't be sure since I could only barely see the purpose. That for LF -> BR I understood, but the further conditions were a bit.. foggy. Now, with that for multiple tags I meant such things:

Code:
...
<table><table><table><table><table><table><table>
...

I mentioned it because you only check whether a line contains a tag (if I didn't miss something in your code). And this input should give only a "true". Not the number of <table>s (which is what you need in this case).

_________________
++


Sat Jan 05, 2008 1:47 pm
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
Heh... this is a cookie for you: I just made a C++ variant, and am pretty sure you don't even need closing tags array in C# code. However I may have not done the thing exactly, in the case I haven't understood what are you doing.

Anyway, have you got streams in C#? They're more comfortable than strings.


You do not have the required permissions to view the files attached to this post.

_________________
++


Sat Jan 05, 2008 2:34 pm
Profile WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
hmmm, I can read part of that.

Sorry about not explaining what it does. Basically I scan each line for a tag. If I find the tag, I add it on to the end of my arraylist. I then scan the line for closing tags to close the last tag I put in my arraylist and if I find it, I remove the tag.

regarding your example of <table> <table> <table> all on one line, yep, that'd break my method. I'd see the first table, add 1 table to the arraylist, then look for a different tag. I wouldn't record all instances of <table>. I guess instead of doing .contains() I could iterate through every character in the line for a match. Ugh.

For the c++ variant, I understand parts of it but I'm not quite grokking the overall flow. Oh, and yes, c# can handle streams. I don't really understand the concept though. I use streams for reading files and such but to me it basically just takes the file and turns it into strings. If it's doing something else I don't really know what's going on.

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Sat Jan 05, 2008 5:50 pm
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
Look for '<'s in the text. If a '<' is followed by a tag name, then it is (most probably) an opening tag. If '<' is followed by '/' and a tag name, then it is (most probably) an closing tag.

The C++ thing isn't perfect from the point of leaks and speed too. But I thought it would not serve the purpose (showing some basic structure) if it were in full size and shine. So, don't spend much time on it.

Now, sigh, actually I didn't want you to tell me what your code is doing (I could see that barely), but to define the problem. Now, sorry for the ignorance, but I had very few meetups with C# and you may be assuming what someone else (say who knows less or same as me) don't know. So, let me try to define the problem:

You got HTML code, where you should replace LFs with BRs. Yet, some sort of problem occurs when you put BRs on place of each LF, and that is why you don't print it when LF is nested between some tags.

If my guess of what you are trying to make is right, this might be not so luckily approach. Why don't you define as well tags where LF should be replaced with BR in the first place? For example, in <table><tr><td> ... </td></tr></table> you may have text. This text can be formatted with <br />s. So (abstract point) ... every time you want to solve some problem, define it. This topic may need a split so that useful information be separated from talk about it.

_________________
++


Sat Jan 05, 2008 6:06 pm
Profile WWW
Felix Rex
User avatar

Joined: Fri Mar 28, 2003 6:01 pm
Posts: 16646
Location: On a slope
Reply with quote
Post 
Ah, ok. Yes, the point is that line feeds in some html structures (style, table and lists) don't get converted to <br>. That way if you have a prettily-formatted table in html and you run it through the method to convert the LF to <br> you don't end up with 30 line feeds in front due to all the LFs inside the html code.

The reason I need it is because the data input may be by people that don't understand html. Input it via a textarea that then gets saved to a database. When redisplaying the content in html form I don't want it messing up the LFs. How can someone that doesn't know html add html that'll screw up the formatting? Microsoft! They're practically html ignorant but know enough about MS apps to 'save for web' and get html markup to create tables, without actually knowing html.

Hope that helps. :)

_________________
They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety.


Sun Jan 06, 2008 9:53 am
Profile WWW
Emperor
User avatar

Joined: Wed Apr 16, 2003 1:25 am
Posts: 2560
Reply with quote
Post 
Ok, you know best what an user can put into textarea you are expecting input from. The task is pretty abstract without that (and thanks, I don't need to know it), but I think I can cast a suggestion: maybe it is easier to define on which areas of the input conversion should be done, no matter if they are in a table or even list (you can use <br>s in a list).

_________________
++


Sun Jan 06, 2008 12:15 pm
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 12 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware.