Command injection when ingesting a remote Kaggle dataset due to a lack of input sanitization in the ingest_kaggle() API
(,3.9.10]
Deep Lake can be used for storing data and vectors while building LLM applications or to manage datasets while training deep learning models.
Datasets can be loaded from various external sources, such as the Kaggle platform.
In order to load an external Kaggle dataset a user will use the exported ingest_kaggle
method.
The method will receive the tag
parameter which should indicate the Kaggle dataset tag.
The tag
parameter propagates into the _exec_command
method without any form of input filtering.
Due to this issue, if a user builds an external facing application based on the Deep Lake application with the ability to upload Kaggle datasets, an attacker will be able to perform a remote code execution attack on the server, compromising all integrity, availability, and confidentiality of the available resources.
import deeplake
deeplake.ingest_kaggle('some/text||touch /tmp/hacked','/tmp/somepath','./tmp/somepath2',kagg
le_credentials={"username":"mister","key":"john","password":"doe"},overwrite=True)
No mitigations are supplied for this issue