Fix content encoding issues

Summary:
Changelog: [Network] Non-binary request are not properly utf-8 decoded on both iOS and Android, both when gzipped and when not gzipped

This diff fixes a long standing / ping-pong issue regarding network decoding differences between
* iOS vs Android
* binary vs utf-8
* gzipped vs uncompressed

The changes aren't too big, but the underlying investigating is :)

The primary contribution to this diff is:

First, adding test cases for know problematic cases. This is done by grabbing the messages that are send from the flipper client to flipper using the flipper messages plugin. This is the base64 data that is stored in the `.txt` files. Beyond that, for all tests public endpoints are used, so that we can both get a hold of the raw original files, and how we expect them to be displayed in flipper.

For testing a simple RN app was build, with a button that fires a bunch requests. The first 3 are captured in unit tests, the last one is not idempotent, but a case reported in #1466, so just left it there as manual verification.

```
const fetchData = async () => {
  await fetch(
    'https://raw.githubusercontent.com/SangKa/MobX-Docs-CN/master/docs/donating.md',
    {
      headers: {
        'Accept-Encoding': 'identity', // signals that we don't want gzip
      },
    },
  );
  await fetch('https://reactnative.dev/img/tiny_logo.png?x=' + Math.random());
  await fetch(
    'https://raw.githubusercontent.com/SangKa/MobX-Docs-CN/master/docs/donating.md',
  );
  await fetch(
    'https://ex.ke.com/sdk/recommend/html/100001314?hdicCityId=110000&paramMap[source]=&id=100001314&mediumId=100000037&elementId=&resblockId=1111027381003&templateConfig=%5Bobject%20Object%5D&fbExpoId=346620976471638017&fbQueryId=&required400=true&unique=1111027381003&parentSceneId=',
  );
};
```

The second contribution of this diff is that it doesn't use weird URLencoder hacks to convert base64 to utf8, but rather a proper library. The problem with our original solution, using `atob` is that it converts to ASCII, not to utf-8, which is the source of the original bugs. See for more background on this: https://www.npmjs.com/package/js-base64#decode-vs-atob-and-encode-vs-btoa-

Solves:
https://github.com/facebook/flipper/issues/1466
https://github.com/facebook/flipper/pull/1541
https://github.com/facebook/flipper/issues/1458

Supersedes D23837750

Future work: we don't inspect the `content-type=xxx;charset` header yet, which we should do for less common encodings, to make sure that they get displayed correctly as well

Future work: in feature like copy data and curl, we always call decode body, without check if we are actually dealing with non-binary data. Probably it is better to keep binary data in base64, rather than decoding it, as that will assume the data is an utf-8 string, which might fail.

An assumption in these changes is that binary data is never gzipped, which is generally correct; gzip is not applied by webserver to things like images, as it would increase, not decrease their size, and waste a lot of computation power.

Reviewed By: cekkaewnumchai

Differential Revision: D23403095

fbshipit-source-id: 5099cc4a7503f0f63bd10585dc6590ba893f3dde
This commit is contained in:
Michel Weststrate
2020-10-14 05:50:07 -07:00
committed by Facebook GitHub Bot
parent 5c82b9d860
commit 6b7b1fab5c
15 changed files with 143 additions and 32 deletions

View File

@@ -9,6 +9,7 @@
import pako from 'pako';
import {Request, Response, Header} from './types';
import {Base64} from 'js-base64';
export function getHeaderValue(headers: Array<Header>, key: string): string {
for (const header of headers) {
@@ -24,44 +25,35 @@ export function decodeBody(container: Request | Response): string {
return '';
}
const b64Decoded = atob(container.data);
try {
if (getHeaderValue(container.headers, 'Content-Encoding') === 'gzip') {
// for gzip, use pako to decompress directly to unicode string
return decompress(b64Decoded);
const isGzip =
getHeaderValue(container.headers, 'Content-Encoding') === 'gzip';
if (isGzip) {
try {
// The request is gzipped, so convert the base64 back to the raw bytes first,
// then inflate. pako will detect the BOM headers and return a proper utf-8 string right away
return pako.inflate(Base64.atob(container.data), {to: 'string'});
} catch (e) {
// on iOS, the stream send to flipper is already inflated, so the content-encoding will not
// match the actual data anymore, and we should skip inflating.
// In that case, we intentionally fall-through
if (!('' + e).includes('incorrect header check')) {
throw e;
}
}
}
return b64Decoded;
// If this is not a gzipped request, assume we are interested in a proper utf-8 string.
// - If the raw binary data in is needed, in base64 form, use container.data directly
// - either directly use container.data (for example)
return Base64.decode(container.data);
} catch (e) {
console.warn(
`Flipper failed to decode request/response body (size: ${b64Decoded.length}): ${e}`,
`Flipper failed to decode request/response body (size: ${container.data.length}): ${e}`,
);
return '';
}
}
function decompress(body: string): string {
const charArray = body.split('').map((x) => x.charCodeAt(0));
const byteArray = new Uint8Array(charArray);
try {
if (body) {
return pako.inflate(byteArray, {to: 'string'});
} else {
return body;
}
} catch (e) {
// Sometimes Content-Encoding is 'gzip' but the body is already decompressed.
// Assume this is the case when decompression fails.
if (!('' + e).includes('incorrect header check')) {
console.warn('decompression failed: ' + e);
}
}
return body;
}
export function convertRequestToCurlCommand(request: Request): string {
let command: string = `curl -v -X ${request.method}`;
command += ` ${escapedString(request.url)}`;
@@ -70,7 +62,7 @@ export function convertRequestToCurlCommand(request: Request): string {
const headerStr = `${header.key}: ${header.value}`;
command += ` -H ${escapedString(headerStr)}`;
});
// Add body
// Add body. TODO: we only want this for non-binary data! See D23403095
const body = decodeBody(request);
if (body) {
command += ` -d ${escapedString(body)}`;