This forum has been moved here:
Helicon Tech Community Forum

ISAPI_Rewrite 3.0 (Forum Locked Forum Locked)
 Helicon Tech : ISAPI_Rewrite 3.0
Subject Topic: Encoded question mark in URI gets decoded
Author
Message |
da_cameron
Newbie


Joined: 15 September 2011
Location: United Kingdom
Posts: 2
Posted: 15 September 2011 at 9:37am

Hi
I have a problem wherein I have a URI of this format:

/path/to/file/key1/value1/key2/value2/key3/value3, etc

ie: there are key/value pairs embedded in the URI.  One of the values is user-entered from a form, so can be anything.  We URI-encode this value with the JavaScript function encodeURIComponent() (http://www.w3schools.com/jsref/jsref_encodeuricomponent.asp), so that it's "safe" to put in the URI.  This all works fine.

We have some rewrite rules to expand the URI back out into URL parameter/value pairs before being passed to the app server (ColdFusion in this case, although this is irrelevant).

What we see happening is the rewrite engine is decoding the question mark in one rule, and then in the following rule which has a QSA switch, it's taking the decoded question mark as the end of the URI, and so treating everything after it as query string.  This is - obviously not correct.

Hopefully that makes sense... here's the relevant example:

URL:

 http://www.scribble.local/helicon/test2/first/tahi%3F/second/rua/third/toru/

This is representing name/value pairs, thus:

first = tahi? (the value is literally "tahi?", including the question mark, which has been URI-escaped: "%3F" is the "?")

second=rua

third=toru

 ("tahi, rua, toru, wha" is Maori for "one, two, three, four", btw)

 Now the rewrite rules are thus:

 RewriteCond %{REQUEST_URI}          .*?/first/(.*?)(?:/) [NC]

RewriteRule ^(helicon/test2/.*)$    $1?first=%1 [QSA,NC]

 

RewriteCond %{REQUEST_URI}          .*?/second/(.*?)(?:/) [NC]

RewriteRule ^(helicon/test2/.*)$    $1?second=%1 [QSA,NC]

 

RewriteCond %{REQUEST_URI}          .*?/third/(.*?)(?:/) [NC]

RewriteRule ^(helicon/test2/.*)$    $1?third=%1 [QSA,NC]

 

RewriteRule ^helicon/test2/.*$      /helicon/test.cfm?fourth=wha [QSA,NC,L]

 This results in the following URL parameters being created:

 URL

NB: [NAME] = [VALUE]

[second] = [rua]

[third] = [toru]

[fourth] = [wha]

[/second/rua/third/toru/?first] = [tahi?]

And this query string:

[fourth=wha&third=toru&second=rua&/second/rua/third/toru/?first=tahi?]

Obviously this is not correct.  The logs give a clue here:

(3) applying pattern '^(helicon/test2/.*)$' to uri 'helicon/test2/first/tahi?/second/rua/third/toru/'

(4) RewriteCond: input='/helicon/test2/first/tahi?/second/rua/third/toru/' pattern='.*?/first/(.*?)(?:/)' => matched

(1) Rewrite URL to >> /helicon/test2/first/tahi?/second/rua/third/toru/?first=tahi?

(2) rewrite 'helicon/test2/first/tahi?/second/rua/third/toru/' -> '/helicon/test2/first/tahi?/second/rua/third/toru/?first=tahi?'

 

(3) applying pattern '^(helicon/test2/.*)$' to uri 'helicon/test2/first/tahi'

(4) RewriteCond: input='/helicon/test2/first/tahi?/second/rua/third/toru/' pattern='.*?/second/(.*?)(?:/)' => matched

(1) Rewrite URL to >> /helicon/test2/first/tahi?second=rua&/second/rua/third/toru/?first=tahi?

(2) rewrite 'helicon/test2/first/tahi' -> '/helicon/test2/first/tahi?second=rua&/second/rua/third/toru/?first=tahi?'

 

(3) applying pattern '^(helicon/test2/.*)$' to uri 'helicon/test2/first/tahi'

(4) RewriteCond: input='/helicon/test2/first/tahi?/second/rua/third/toru/' pattern='.*?/third/(.*?)(?:/)' => matched

(1) Rewrite URL to >> /helicon/test2/first/tahi?third=toru&second=rua&/second/rua/third/toru/?first=tahi?

(2) rewrite 'helicon/test2/first/tahi' -> '/helicon/test2/first/tahi?third=toru&second=rua&/second/rua/third/toru/?first=tahi?'

 

(3) applying pattern '^helicon/test2/.*$' to uri 'helicon/test2/first/tahi'

(1) Rewrite URL to >> /helicon/test.cfm?fourth=wha&third=toru&second=rua&/second/rua/third/toru/?first=tahi?

(2) rewrite 'helicon/test2/first/tahi' -> '/helicon/test.cfm?fourth=wha&third=toru&second=rua&/second/rua/third/toru/?first=tahi?'

(2) internal redirect with /helicon/test.cfm?fourth=wha&third=toru&second=rua&/second/rua/third/toru/?first=tahi? [INTERNAL REDIRECT]

The problem seems to occur when the QSA operation on the second rewrite.  It's identifying the "query string" as starting after "tahi?".  This seems to be because:

* the rewrite engine has unescaped the %3F, for some reason;

* it's then seeing that unescaped %3F/? as the end of the URI; and everything after it as being query string.

 To me this seems like a bug?  Although I am happy to be corrected on that.

 I'd be even happier still if someone would advise of how I could work around this.

Cheers for wading through this.

Oh:
ISAPI Rewrite version Version 3.1.0.61.  Dunno if you need any other info about our set-up?

--
Adam
Back to Top Visit da_cameron's Homepage
 
Guests
Guest


Joined: 01 October 2003
Online Status: Online
Posts: -160
Posted: 19 September 2011 at 4:34am

Hello Adam,

This is not a bug. ISAPI_Rewrite captures everything before '?' in %{REQUEST_URI} and everything after '?' in %{QUERY_STRING}.
So we'd suggest to modify the logic of your algorithm to use smth like:
Code:
RewriteCond %{REQUEST_URI}\?%{QUERY_STRING}

or simply add another condition:
Code:
RewriteCond %{QUERY_STRING} ^/second/rua/third/toru/$ [NC]



Regards
Andrew
Back to Top
 
da_cameron
Newbie


Joined: 15 September 2011
Location: United Kingdom
Posts: 2
Posted: 19 September 2011 at 4:46am

Yep, I get that.  I think maybe I wasn't clear in my original post.  Or because my post was quite long (although I thought it was all relevant detail), you missed the crux of the issue.

The problem is that there's an %-encoded question mark *IN THE URI*, and ISAPI_rewrite is mistakenly identifying it as the delimiter between URI & query string.

Let me summarise.

We have a URL:

 http://www.scribble.local/helicon/test2/first/tahi%3F/second/rua/third/toru/

Notes:

* there is no question mark in the URL being rewritten

* there is no query string

* there is, however, a %-encoded question mark in the URI (%3F)

We apply a series of rewrites to that URL, and part way through ISAPI_rewrite decodes that %-encoded question mark, and then - mistakenly - starts treating that question mark as the delimiter between URI and query string.  Which it should not be doing.

Note that the series of rewrites work perfectly to spec unless there's an encoded question mark in one of the URI components.

I have highlighted where this is happening in my original post.

Make sense?  If it does, it might be good to review the detail in my original post which points out where this is happening.

Or am I missing something?

I can provide more info if you let me know what would help.

Cheers, btw, for coming back to me on this isse.

--
Adam

Back to Top Visit da_cameron's Homepage
 
Guests
Guest


Joined: 01 October 2003
Online Status: Online
Posts: -160
Posted: 20 September 2011 at 5:42pm

And I, on my side, understand that it's not a query_string, but there's no way to explain that to ISAPI_Rewrite. The only reason for ? to appear in URL
is to be a separator of URI and query_string. I suggest you do pretend as if it is a query_string.

There's an assumption, that your encoded '?' goes through the set of iterations, after the first one the outgoing request gas '?' unencoded. You may try
to apply some of the encoding flags from RewriteRule directive. Those are NU, NE
flags.

Regards
Andrew
Back to Top
 
baynezy
Newbie


Joined: 03 February 2009
Posts: 31
Posted: 21 September 2011 at 5:11am

Andrew,
I work with Adam and the NU and NE flags are not suitable to solve this problem.

nounicode|NU
This sounds like it would solve the problem as it says 'If NU flag is set, transformation from Unicode to UTF-8 will not take place and all Unicode characters remain encoded in %xx format.' However, it does not stop the %3F being converted to the '?'

noescape|NE
This as decribed here 'Don't escape output. By default ISAPI_Rewrite will encode all non-ANSI characters as %xx hex codes in output.' does the exact opposite of what we want.

It is clearly taking the URI and escaping the values and that is then making subsequest rules see the URI as everything to the left of the now escaped question mark.

We cannot code for this as if it is a query string as there is not always an escaped question mark in the code to cause this issue.
Back to Top
 
Guests
Guest


Joined: 01 October 2003
Online Status: Online
Posts: -160
Posted: 22 September 2011 at 5:44am

Well, we always tell our customers to avoid '?' in any possible way, in case the want to use it for SEO or any purpose different from it's separating function.
There's no actual way to escape '?' in the URL, the fact that URL has '?' means that there's a query_string at the right. This is the way ISAPI_Rewrite works.


Regards
Andrew
Back to Top
 
baynezy
Newbie


Joined: 03 February 2009
Posts: 31
Posted: 22 September 2011 at 10:48am

There is no question mark in the URL there is %3F. Your engine after doing a RewriteRule turns it into a '?'. ONLY when it is in the URI and not in the query string.

I have rewritten the whole thing as follows and this gets around the problem. I appreciate you trying to help but just dismissing it as not possible is a little disappointing.

Solution:-
Code:
RewriteRule ^(helicon/test2.*?/)first/([^/]+)/?(.*)$    $1$3?first=$2 [NC,QSA]
RewriteRule ^(helicon/test2.*?/)second/([^/]+)/?(.*)$    $1$3?second=$2 [NC,QSA]
RewriteRule ^(helicon/test2.*?/)third/([^/]+)/?(.*)$    $1$3?third=$2 [NC,QSA]

RewriteRule ^helicon/test2/.*$                             /helicon/test.cfm?fourth=wha [QSA,NC,L]

Back to Top
 
Guests
Guest


Joined: 01 October 2003
Online Status: Online
Posts: -160
Posted: 23 September 2011 at 5:02am

Well, there's a workaround I decided to avoid mentioning. It's Directive "RewriteCompatability2 on"
It makes ISAPI_Rewrite 3 use old processing order from ISAPI_Rewrite 2. It allows you to write rules with '?' in the pattern without encoding(you simply escape them as '\?'), so your rules might work.

It works for the whole .htaccess, no matter whether you turn it on in the beginning or in the end of the .htaccess. So I'm concerned about your other rules that can be affected by this directive.

I find your workaround more delicate and elegant.


Regards
Andrew
Back to Top
 

Sorry, you can NOT post a reply.
This forum has been locked by a forum administrator.

Printable version Printable version