Skip to content

tidy heap-buffer-overflow #217

Closed
Closed

Description

Image for: Description

Fernando has found and reported a heap-buffer-overflow directly to me, the text so far is attached below, and I have now been able to REPEAT the event, and am now searching for a fix.

I do not know anything about Valgrind, but luckily the Windows MSVC Debug configuration does the same. It inserts a 'deb_malloc' which allocates much more than the user request, and fills the leading and trailing memory with a marker, and returns the appropriate pointer to the user.

Leading is 0xfd, and trailing is 0xab, Then on 'deb_free', it checks the leading marker for buffer under-run, and the trailing marker for buffer over-run.

This is a view of the memory as allocated in this case

fd fd fd fd fd fd fd fd ab ab ab ab ab ab ab ab ab ab ab ab ab ab ...

The event happens in ParseValue, where the length is held in an int len, which gets to contain a -1! MINUS ONE!! It is parsing the anchor href in this strange string <a b=<a <?xm ?>b="c"G href="�"»

ParseValue then calls -
value = TY_(tmbstrndup)(doc->allocator, lexer->lexbuf + start, len);
to do the allocation...

Inexplicably tmpstrndup is decalared as -
tmbstr TY_(tmbstrndup)( TidyAllocator *allocator, ctmbstr str, uint len);

Notice the int has now become a uint, so can not test for less that zero... it does if (len > 0), but of course len is now 4294967295, but like for all such allocations tmbstrndup does -

    tmbstr cp = s = (tmbstr) TidyAlloc( allocator, 1+len );

Notice the plus 1, so it arrives at TidyAlloc with a ZERO!!!

Now it seems malloc does not mind a zero value, malloc(0), and dutifully returns a pointer!!!! But that is a pointer to what exactly?

Then tmbstrndup does the corruption, with -

while ( len-- > 0 && (*cp++ = *str++) ) /**/;

Of course ( len-- > 0 ) will be true until the 4294967295 expires ;=))

But thankfully the corruption stops when a 0 is reached in the lexer with (*cp++ = *str++). As indicated in this case it is storing the attribute "href", but that is 4+ bytes of corruption.

Then a final corruption, the reason for the plus 1 - *cp = 0;

This is a view of the memory after -

fd fd fd fd 68 72 65 66 00 00 ab ab ab ab ab ab ab ab

Note in this case, since the loop did not end when len became zero, so we end up with a double 0! But I get a big popup dialog when this memory is passed to 'deb_free'... advising me of possible heap corruption...

So that is a detailed desciption of the problem. Now to decide what would be the best fix! Ideas...

Since tmbstrndup does check is len > 0, could change the uint to an int, and then it would return a NULL memory. Probably not a good thing in all case...

I am still trying to understand exactly why the int len back in ParseValue went negative!

Will work on this... As indicated any ideas welcome...

And again thanks to Fernando for finding this case... What follows is our direct conversation to date...

Hey Geoff,

I believe I tried both versions, the old (which is the one that Debian
stable ships) and the new with HTML5 support from github.

Did you try to run it under valgrind? Valgrind reports the errors but
the app won't segfault.

I can give it a try later again and tell you the exact version I used.

Thanks

On 6/1/15 5:28 AM, Geoff McLane wrote:

Hi Fernando,

Can you show me what version of tidy you are
using?

$ tidy-asan -v

From the dump you provided, this shows
libtidy-0.99.so, which is now a *** VERY ***
OLD version ;=((.

I tested your sample html -

$ printf
"\x3c\x61\x20\x62\x3d\x3c\x61\x20\x3c\x3f\x78\x6d\x0d\x3f\x3e\x62\x3d\x22\x63\x22\x47\x20\x68\x72\x65\x66\x3d\x22\x12\x22\xbb"

with the latest tidy5 version 4.9.30 built from
this source -

https://github.com/htacg/tidy-html5

and had no problem in Ubuntu 14.04 linux, 64-bit, nor
in Windows 7 64-bits... many runs...

I tried (remember tidy5 defaults to UTF-8 if no other
indication) -

$ tidy5 err.html

Got 10 warnings, including invalid UTF-8 bytes (char.
code U+00BB)

$ tidy5 --show-body-only yes err.html

This reduces the warnings to 7 for this html snippet,
still with invalid UTF-8

Even tried -

$ tidy5 --show-body-only yes -big5 err.html

The warning reduce to 6, dropping the invalid
UTF-8 warning...

BTW, all tidy builds default to adding -

-DSUPPORT_ASIAN_ENCODING=1

Of course I can do nothing about earlier
versions, except urge an upgrade. libtidy-0.99.so
is circa 2008/2009...

We have had some contact with some Debian
maintainers, and hopefully they are in the
process of upgrading tidy...

We also provide some binary releases -
http://www.htacg.org/binaries/
however still to get around to adding 4.9.30

Also note the binary in these is named 'tidy5'
to avoid overwriting any existing 'tidy'. This
will change when the 5.0.0 is released...

But if the problem persists in the latest 4.9.30
(or later), then please file an Issue here -

https://github.com/htacg/tidy-html5/issues

Also add any configuration used, including any
TIDY_CONIFG_FILE if defined...

Thank you for reporting...

Regards,
Geoff.

On 01/06/15 10:10, Fernando Muñoz wrote:

Hello,

I contacted Debian about the issue on May 17, so far I have not
received a response about a CVE assignment.

Thanks!

---------- Forwarded message ----------
From: Fernando Muñoz fernando@null-life.com
Date: Sun, May 17, 2015 at 8:11 PM
Subject: tidy heap-buffer-overflow
To: security@debian.org

Package: tidy

tidy is affected by a write out of bounds when processing malformed
html files.

$ printf
"\x3c\x61\x20\x62\x3d\x3c\x61\x20\x3c\x3f\x78\x6d\x0d\x3f\x3e\x62\x3d\x22\x63\x22\x47\x20\x68\x72\x65\x66\x3d\x22\x12\x22\xbb"

err.html
An asan-enabled build of tidy outputs:

$ tidy-asan err.html

==2196==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xb53006b1 at pc 0xb71df8fe bp 0xbfac9928 sp 0xbfac9918
WRITE of size 1 at 0xb53006b1 thread T0
#0 0xb71df8fd in prvTidytmbstrndup
(/usr/lib/libtidy-0.99.so.0+0x15c8fd)
#1 0xb7141060 in prvTidyGetToken
(/usr/lib/libtidy-0.99.so.0+0xbe060)
#2 0xb711856e in prvTidyParseDocument
(/usr/lib/libtidy-0.99.so.0+0x9556e)
#3 0xb71f2a58 in prvTidyDocParseStream
(/usr/lib/libtidy-0.99.so.0+0x16fa58)
#4 0xb71f34a5 in tidyParseFile (/usr/lib/libtidy-0.99.so.0+0x1704a5)
#5 0x804bfa9 (/usr/bin/tidy+0x804bfa9)
#6 0xb6edf72d in __libc_start_main
(/lib/i386-linux-gnu/libc.so.6+0x1872d)
#7 0x804fa4e (/usr/bin/tidy+0x804fa4e)

0xb53006b1 is located 0 bytes to the right of 1-byte region
[0xb53006b0,0xb53006b1)
allocated by thread T0 here:
#0 0xb72af18c in __interceptor_malloc
(/usr/lib/i386-linux-gnu/libasan.so.1+0x5118c)
#1 0xb71c5963 (/usr/lib/libtidy-0.99.so.0+0x142963)
...

Valgrind with the standard build:

$ valgrind tidy err.html
...
==30499== Invalid write of size 1
==30499== at 0x408805C: prvTidytmbstrndup (tmbstr.c:39)
==30499== by 0x40738A8: ParseValue (lexer.c:3486)
...

==30499== Invalid write of size 1
==30499== at 0x4088065: prvTidytmbstrndup (tmbstr.c:41)
==30499== by 0x40738A8: ParseValue (lexer.c:3486)
==30499== by 0x4075F39: ParseAttrs (lexer.c:3603)
==30499== by 0x4075F39: GetTokenFromStream (lexer.c:2416)

...
file: tmbstr.c

39 while ( len-- > 0 && (cp++ = *str++) )
40 /
*/;
41 *cp = 0;

Credit: Fernando Muñoz

Metadata

Image for: Metadata

Metadata

Image for: Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

Image for: Issue actions