Deep Lake Kaggle dataset command injection | JFSA-2024-001035320

Command injection when ingesting a remote Kaggle dataset due to a lack of input sanitization in the ingest_kaggle() API

(,3.9.10]

Deep Lake can be used for storing data and vectors while building LLM applications or to manage datasets while training deep learning models. Datasets can be loaded from various external sources, such as the Kaggle platform. In order to load an external Kaggle dataset a user will use the exported ingest_kaggle method.

The method will receive the tag parameter which should indicate the Kaggle dataset tag.

The tag parameter propagates into the _exec_command method without any form of input filtering.

Due to this issue, if a user builds an external facing application based on the Deep Lake application with the ability to upload Kaggle datasets, an attacker will be able to perform a remote code execution attack on the server, compromising all integrity, availability, and confidentiality of the available resources.

import deeplake

deeplake.ingest_kaggle('some/text||touch /tmp/hacked','/tmp/somepath','./tmp/somepath2',kagg
le_credentials={"username":"mister","key":"john","password":"doe"},overwrite=True)

No mitigations are supplied for this issue

Vendor fix

JFSA-2024-001035320 - Deep Lake Kaggle dataset command injection

Summary

Component

Affected versions

Description

PoC

Vulnerability Mitigations

References