EduMIND is an enterprise-grade, highly modular Bilingual Lecture Assistant & Active Learning Pipeline. Designed specifically for academic environments where lectures mix languages (e.g., Code-Mixed Vietnamese-English, such as “hôm nay chúng ta học attention mechanism”), EduMIND transcribes bilingual speech, measures code-switching metrics, translates text preserving technical terms, indexes slides, and executes retrieval-augmented generation (RAG).
The system integrates a Human-in-the-Loop Active Learning framework powered by Label Studio and an ML backend to continually harvest human-corrected data, immediately updating the local knowledge base and building a gold-standard corpus.
+----------------------------------+
| Bilingual Audio Lecture |
+-----------------+----------------+
|
v
[ 🎙️ Bilingual Note-Taker ]
Whisper ASR + Post-RegEx
|
v
[ 🔄 VietMix Translation & CMI ]
Dict / Seq2Seq Translation
|
v
[ 📚 Anti-Forget RAG Engine ]
PDF Chunking -> Qdrant
|
+---------------------------+---------------------------+
| |
v (Retrieval QA) v (Active Learning)
[ Streamlit Assistant ] [ Label Studio UI (Port 8080) ]
RAG Chat + Analytics TA/Human Review & Correction
|
v
[ edumind_ml_backend ]
- Writes to corpus.jsonl
- Re-indexes to Qdrant Vector DB
vi, en, or other) to compute the Code-Mixing Index (CMI):
\(\text{CMI} = \frac{N - \max(w_{\text{vi}}, w_{\text{en}})}{N}\)
(where $N$ is the total count of linguistic tokens).RuleBasedTranslationProvider: High-performance, zero-latency dictionary lookup mapping.HuggingFaceTranslationProvider: Neural Seq2Seq model (e.g., Helsinki-NLP/opus-mt-vi-en) with automatic rule-based fallback.ms-marco-MiniLM-L-6-v2) before synthesis.data/processed/corpus.jsonl).├── LICENSE <- MIT License
├── README.md <- This main system guide
├── CONTRIBUTING.md <- Development, CI/CD, and style guidelines
├── Makefile <- Task automation commands
├── pyproject.toml <- Project specs & package dependencies
├── uv.lock <- Lockfile for exact package reproducibility
├── docker-compose.yml <- Docker compose configuration for the LS stack
├── Dockerfile.label-studio <- Multi-stage Docker build for the ML backend
│
├── configs/
│ └── default_config.yaml <- Hyperparameter configurations
│
├── data/
│ ├── raw/
│ │ ├── audio_chunks/ <- Raw lecture wav chunks
│ │ └── pdf_slides/ <- PDF lecture materials
│ └── processed/
│ └── corpus.jsonl <- Target gold-standard active learning corpus
│
├── edumind/ <- Core Python source package
│ ├── app.py <- Streamlit frontend implementation
│ ├── config/ <- Pydantic validation definitions
│ ├── core/ <- Logger, Dependency Injection container, Exceptions
│ ├── models/ <- Data models & schemas (ASR, Translation, RAG)
│ ├── modules/ <- Core engines (RAG, Speech ASR, VietMix Translator)
│ ├── services/ <- Strategy implementations (Embedding, LLM, Translation)
│ └── utils/ <- String utilities, file helpers, model registries
│
├── label_studio_backend/ <- Flask active learning ML Backend
│ ├── _wsgi.py <- WSGI entry point for container execution
│ ├── model.py <- Label Studio ML backend subclass code
│ └── setup_env.sh <- Shell bootstrapper for local host testing
│
└── tests/ <- Complete unit & integration test suite
This project uses uv for python virtual environment compilation. Ensure it is installed on your machine.
git clone <repo-url>
cd edumind
make requirements
This automatically builds a virtual environment under .venv/ and installs the package in editable mode.
.env and fill in your values (like LLM API keys):
cp .env.example .env
The system can be run in two main ways: Local Host Development or Containerized Docker Compose Stack.
To launch the interactive frontend dashboard:
make app
Access the interface at http://localhost:8501.
This launches both Label Studio UI and the EduMIND ML Backend in a shared Docker network:
# Start the stack in background
make docker-up
# Check container status
docker compose ps
# View logs
make docker-logs
# Stop the stack
make docker-down
http://localhost:8080 (Credentials: admin@edumind.local / edumind_admin_2024)http://localhost:9090 (connected at http://ml-backend:9090 inside Docker)If you want to run Label Studio and the ML Backend natively on your host system:
# Installs Label Studio binaries and starts both servers in one terminal session
make run-ls
To run the complete suite of 50+ unit and integration tests:
make test
Code formatting is strictly checked using Ruff. Always format your code before pushing changes:
# Auto-format and resolve lint errors
make format
# Dry-run check
make lint