wesslen commited on
Commit
f40eb98
·
1 Parent(s): d0750ff

copy repo from rlhf-ranking; update for respond

Browse files
Files changed (5) hide show
  1. Dockerfile +37 -0
  2. data/dataset.jsonl +0 -0
  3. prodigy.json +13 -0
  4. prodigy.sh +1 -0
  5. rlhf_respond.py +33 -0
Dockerfile ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9
2
+
3
+ #COPY requirements.txt /app/
4
+ WORKDIR /app
5
+ # # Set up a new user named "user" with user ID 1000
6
+ # RUN useradd -m -u 1000 user
7
+ # # Switch to the "user" user
8
+ # USER user
9
+ # # Set home to the user's home directory
10
+ # ENV HOME=/home/user \
11
+ # PATH=/home/user/.local/bin:$PATH
12
+
13
+ # # Set the working directory to the user's home directory
14
+ # WORKDIR $HOME/app
15
+
16
+ # # Copy the current directory contents into the container at $HOME/app setting the owner to the user
17
+ # COPY --chown=user . $HOME/app
18
+
19
+ RUN --mount=type=secret,id=LICENSE_KEY,mode=0444,required=true \
20
+ pip install --upgrade pip \
21
+ && pip install typing_extensions==4.5.0 \
22
+ && pip install --quiet prodigy -f https://$(cat /run/secrets/LICENSE_KEY)@download.prodi.gy
23
+
24
+ RUN chmod 777 .
25
+
26
+ COPY prodigy.json .
27
+ COPY data ./data/
28
+ COPY rlhf_ranking.py .
29
+ COPY prodigy.sh .
30
+
31
+ ENV PRODIGY_HOME /app
32
+ ENV PRODIGY_LOGGING "verbose"
33
+ ENV PRODIGY_ALLOWED_SESSIONS "user1,user2"
34
+
35
+ EXPOSE 7860
36
+
37
+ CMD ["bash","prodigy.sh"]
data/dataset.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
prodigy.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "port": 7860,
3
+ "host": "0.0.0.0",
4
+ "db": "sqlite",
5
+ "db_settings": {
6
+ "sqlite": {
7
+ "name": "prodigy.db",
8
+ "path": "/app"
9
+ }
10
+ },
11
+ "feed_overlap": true,
12
+ "show_stats": true
13
+ }
prodigy.sh ADDED
@@ -0,0 +1 @@
 
 
1
+ python -m prodigy rlhf.respond rlhf_data data/dataset.jsonl -F rlhf_respond.py
rlhf_respond.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import prodigy
2
+ import itertools as it
3
+ from prodigy.util import set_hashes
4
+ from prodigy import get_stream
5
+
6
+
7
+ @prodigy.recipe(
8
+ "rlhf.respond",
9
+ dataset=("Dataset to save answers to", "positional", None, str),
10
+ source=("Datafile to load", "positional", None, str),
11
+ )
12
+ def ranking(dataset, source):
13
+ # Load your own streams from anywhere you want
14
+ stream = get_stream(source)
15
+
16
+ def prep_stream(stream):
17
+ for ex in stream:
18
+ ex['text'] = ex['instruction']
19
+ del ex['instruction']
20
+ yield ex
21
+
22
+ return {
23
+ "dataset": dataset,
24
+ "view_id": "blocks",
25
+ "stream": prep_stream(stream),
26
+ "config":{
27
+ "global_css": ".prodigy-option{font-size: 15px;}",
28
+ "blocks":[
29
+ {"view_id": "text"},
30
+ {"view_id": "text_input", "field_autofocus": True, "field_rows": 4, "field_placeholder": "Try to use 2-3 sentences to answer the question."},
31
+ ],
32
+ }
33
+ }