RNN for Char Generation

In this document, we will train a simple charchter generator using small RNN network with pytorch. Then in another page, we will run the model to see how it can give a full text.

1. Data Preparation

We will use the Children Stories Text Corpus dataset from Kaggle. It contains public domain children’s books from Project Gutenberg that are suitable for young readers. ### 1.1 Import Libraries

Show Code

import torch
import torch.nn as nn
import string
import random
import requests
import time
import math
import matplotlib.pyplot as plt

# Check for GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cpu

1.2 Load Text Data

We will download “The Happy Prince and Other Tales” by Oscar Wilde from Project Gutenberg to use as our training corpus. This includes the story you found plus others to provide enough data for valid training.

Show Code

# The Happy Prince and Other Tales
url = "https://www.gutenberg.org/files/902/902-0.txt"
response = requests.get(url)
text = response.text

# Simple cleaning to remove legal header/footer if present
start_marker = "*** START OF THE PROJECT GUTENBERG EBOOK"
end_marker = "*** END OF THE PROJECT GUTENBERG EBOOK"

start_idx = text.find(start_marker)
end_idx = text.find(end_marker)

if start_idx != -1:
    text = text[start_idx + len(start_marker):]
if end_idx != -1:
    text = text[:end_idx]

# Clean up whitespace
text = text.strip()

Show Code

print(f"Dataset length: {len(text)} characters")
print("First 200 characters:\n")
print(text[1800:2500])

Dataset length: 91198 characters
First 200 characters:

d two bright sapphires, and a large red ruby glowed on his
sword-hilt.

He was very much admired indeed.  “He is as beautiful as a weathercock,”
remarked one of the Town Councillors who wished to gain a reputation for
having artistic tastes; “only not quite so useful,” he added, fearing
lest people should think him unpractical, which he really was not.

“Why can’t you be like the Happy Prince?” asked a sensible mother of her
little boy who was crying for the moon.  “The Happy Prince never dreams
of crying for anything.”

“I am glad there is some one in the world who is quite happy,” muttered a
disappointed man as he gazed at the wonderful statue.

“He looks just like an angel,

1.3 Build Vocabulary (Character-to-Index Mappings)

We define the valid characters (printable ASCII) and helper functions to convert between characters and numerical indices.

Show Code

string.printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Show Code

# pre-defined list in Python containing all uppercase, lowercase, numbers, and punctuation. 
# This is our "Universe" of possible characters.
all_characters = string.printable
n_characters = len(all_characters)


def char_to_index(char):
    try:
        return all_characters.index(char)
    except ValueError:
        return 0  # Default to 0 if unknown char found


def index_to_char(index):
    return all_characters[index]


# Filter text to ensure all chars are in our vocab
# removes any weird foreign characters
text = ''.join([c for c in text if c in all_characters])

print(f"Vocabulary size: {n_characters}")
print(f"Sample: {all_characters[:90]}...")

Vocabulary size: 100
Sample: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`...

Show Code

print(f"Character to index mapping for 'Hello World!': {[char_to_index(char) for char in 'Hello World!']}")

Character to index mapping for 'Hello World!': [43, 14, 21, 21, 24, 94, 58, 24, 27, 21, 13, 62]

1.4 Prepare Training Tensors

Functions to convert text strings into PyTorch tensors.

Show Code

#Simple example of how the below line_to_tensor function works
line = "Hello World"
tensor = torch.zeros(len(line), dtype=torch.long)
for c in range(len(line)):
    tensor[c] = char_to_index(line[c])
print(f"Tensor for 'Hello World', Length: {len(tensor)}, Values: {tensor}")

Tensor for 'Hello World', Length: 11, Values: tensor([43, 14, 21, 21, 24, 94, 58, 24, 27, 21, 13])

Show Code

# The Translators: from string to tensor and back
def line_to_tensor(line):
    tensor = torch.zeros(len(line)).long() #creates an empty placeholder. If the input line is "Cat" (length 3), it creates a placeholder [0, 0, 0].
    for c in range(len(line)):
        tensor[c] = char_to_index(line[c]) #fills in the placeholder with the index of each character in the line.
    return tensor.to(device) # moves the tensor to the specified device (CPU or GPU).


"""
The Sampler
You cannot feed the entire book into the neural network at once.

Memory: It would crash your computer.
Learning: Networks learn better by seeing small, random examples repeatedly rather than one giant example once.
"""
def random_training_segment(chunk_len=200):
    start_index = random.randint(0, len(text) - chunk_len - 1) # pick a random start index in the text
    end_index = start_index + chunk_len + 1 # 1 for the target character
    return text[start_index:end_index] 


# Test
segment = random_training_segment(50)
print(f"Random segment: '{segment}'")
print(f"Tensor shape: {line_to_tensor(segment).shape}")

Random segment: 'possible!  GOLD Stick, that is what he
said.  Gold'
Tensor shape: torch.Size([51])

2. Model Architecture

2.1 Define RNN Module

Show Code

class CharRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1):
        super(CharRNN, self).__init__()
        self.hidden_size = hidden_size
        self.n_layers = n_layers

        self.embedding = nn.Embedding(input_size, hidden_size) # embedding layer to convert input characters to dense vectors
        # RNN layer. It effectively takes the current input and the "memory" from the previous step to calculate the new state.
        self.rnn = nn.RNN(hidden_size, hidden_size, n_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size) # fully connected layer to map RNN outputs to character scores

    def forward(self, x, hidden):
        x = self.embedding(x) # convert input characters to embeddings
        out, hidden = self.rnn(x.unsqueeze(0), hidden) # Pass that result and the previous hidden state (memory) into the Recurrent Layer.
        out = self.fc(out.squeeze(0)) # Take the output from the Recurrent Layer and pass it through the Decoder Layer to get the final prediction scores.
        return out, hidden

    def init_hidden(self):
        return torch.zeros(self.n_layers, 1, self.hidden_size).to(device) # initializes the hidden state (memory) to zeros
    
    # "Reset" function
    def reset_hidden(self, hidden):
        return hidden.detach() # detaches the hidden state from the computation graph to prevent backpropagation through time

2.1.1 Check the output shape of each layer

In this section we will try a simple code examples to see the shape of each output layer. This is to get a better understanding of how the input is transformed when passed through each layer of the network.

You will notice random weights assigned to each index of the input string, each index gets a [128] randomly initialized weights:

The embedding weights are randomly initialized at first.
During training, these values are updated so that similar characters get similar vectors, and the model learns useful representations.

Show Code

test_line = "Hello world"
test_tensor = line_to_tensor(test_line)  # Shape: [len(test_line)]
hidden = model.init_hidden()

# Get embedding output directly
embedding_out = model.embedding(test_tensor)
print("Tensor shape before embedding layer:", test_tensor.shape)
print("Embedding output shape after embedding layer:", embedding_out.shape)
print("Embedding output values:", embedding_out)

Tensor shape before embedding layer: torch.Size([11])
Embedding output shape after embedding layer: torch.Size([11, 128])
Embedding output values: tensor([[-0.7954,  0.5673,  0.5843,  ...,  0.3401, -0.7524, -0.1958],
        [ 0.0372,  1.4933,  0.0707,  ...,  0.5257,  0.0148,  0.6341],
        [ 0.0967, -0.0635, -0.3429,  ..., -0.1530,  1.3466,  0.6397],
        ...,
        [ 0.5745,  0.7965,  1.0048,  ...,  0.1930, -0.5016, -0.1036],
        [ 0.0967, -0.0635, -0.3429,  ..., -0.1530,  1.3466,  0.6397],
        [ 0.0846, -0.3203,  0.2287,  ...,  0.7414, -0.1800,  0.2324]],
       grad_fn=<EmbeddingBackward0>)

rnn_out: memory for every step (full sequence output) new_hidden: memory for last step only (final state)

Show Code

# Then pass through RNN layer
rnn_out, new_hidden = model.rnn(embedding_out.unsqueeze(0), hidden)

print("RNN output shape:", rnn_out.shape)
#print("RNN output values:", rnn_out)
print("New hidden state shape:", new_hidden.shape)
#print("New hidden state values:", new_hidden)

RNN output shape: torch.Size([1, 11, 128])
New hidden state shape: torch.Size([1, 1, 128])

What does this mean? - For each character position in your input sequence, the fully connected layer (fc) produces a vector of 100 scores.

Each score represents the model’s prediction for the next character at that position (before softmax).

Show Code

# The fc layer output
fc_out = model.fc(rnn_out.squeeze(0))
print("FC layer output shape:", fc_out.shape)

FC layer output shape: torch.Size([11, 100])

Now this fc_out what will be passed to get the loss then move to the backbropagation step. Each vector is a vector of 100 scores representing the probabilities of the char.

3. Training Utilities

3.1 Random Training Example Generator

Show Code

# Random Training Example Generator
def get_training_batch(chunk_len=200):
    chunk = random_training_segment(chunk_len)

    # Example: "Hello"
    # Input: "Hell" (indices)
    # Target: "ello" (indices)

    input_str = chunk[:-1]
    target_str = chunk[1:]

    input_tensor = line_to_tensor(input_str)
    target_tensor = line_to_tensor(target_str)

    return input_tensor, target_tensor


# Test it
inp, tgt = get_training_batch(10)
print(f"Input shape: {inp.shape}")  # Should be [10]
print(f"Target shape: {tgt.shape}")  # Should be [10]

Input shape: torch.Size([10])
Target shape: torch.Size([10])

3.2 Training Step Function

What happens inside forward (shape intuition) - Input to model: [seq_len] = [200] integers (character IDs) - Embedding converts each ID into a vector: [200, hidden_size] - RNN processes 200 time steps and produces hidden outputs per step - Final linear layer maps each step to vocab logits: [200, n_characters] - Target is also [200] (the “next character ID” at each time step)

Show Code

def train_step(model, optimizer, criterion, input_tensor, target_tensor):
    hidden = model.init_hidden()  # Initialize hidden state
    model.zero_grad()  # Zero the gradients
    
    #Forward pass
    output, hidden = model(input_tensor, hidden) # pass a sequence of characters through the model
    
    # Compute loss
    loss = criterion(output, target_tensor)
    
    # Backward pass and optimize
    loss.backward()
    optimizer.step()
    
    return loss.item()

4. Training Process

4.1 Hyperparameter Setup

Setting input_size = n_characters we are telling the model the size of one input vector at one time step.
And output_size = n_characters is the size of the output vector of the next char.

So CharRNN(n_characters, hidden_size, n_characters, ...) means:

“I will receive character IDs in range [0, n_characters-1]”
“I will output a score for each of the n_characters (next-char classification)”

Show Code

# Reproducibility (optional)
torch.manual_seed(42)

# Training hyperparameters
chunk_len = 200
hidden_size = 256
n_layers = 2
learning_rate = 0.002
n_iters = 10000

print_every = 500
plot_every = 100


# Model, loss, optimizer
model = CharRNN(n_characters, hidden_size, n_characters,
                n_layers=n_layers).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

print(model)

CharRNN(
  (embedding): Embedding(100, 256)
  (rnn): RNN(256, 256, num_layers=2, batch_first=True)
  (fc): Linear(in_features=256, out_features=100, bias=True)
)

4.2 Main Training Loop

Show Code

inp, tgt = get_training_batch(chunk_len)
print(f"Input shape: {inp.shape}")  
print(f"Target shape: {tgt.shape}")

Input shape: torch.Size([200])
Target shape: torch.Size([200])

Show Code

def time_since(since: float) -> str:
    s = time.time() - since
    m = math.floor(s / 60)
    s -= m * 60
    return f"{m}m {s:.0f}s"


all_losses = []
running_loss = 0.0
start = time.time()


model.train()
for it in range(1, n_iters + 1):
    input_tensor, target_tensor = get_training_batch(chunk_len) # get a random training example

    loss = train_step(
        model=model,
        optimizer=optimizer,
        criterion=criterion,
        input_tensor=input_tensor,
        target_tensor=target_tensor,
    )
    running_loss += loss

    if it % print_every == 0:
        avg = running_loss / print_every
        print(f"[{time_since(start)}] iter {it}/{n_iters} | loss {avg:.4f}")
        running_loss = 0.0

    if it % plot_every == 0:
        # (Note: this uses the current 'running_loss' window; simple + good enough for demo)
        all_losses.append(loss)

print("Training finished.")

[1m 29s] iter 500/10000 | loss 1.9262
[2m 41s] iter 1000/10000 | loss 1.5714
[3m 43s] iter 1500/10000 | loss 1.4704
[4m 47s] iter 2000/10000 | loss 1.3964
[5m 50s] iter 2500/10000 | loss 1.3304
[6m 50s] iter 3000/10000 | loss 1.3200
[7m 48s] iter 3500/10000 | loss 1.2912
[8m 49s] iter 4000/10000 | loss 1.2752
[9m 50s] iter 4500/10000 | loss 1.2663
[10m 47s] iter 5000/10000 | loss 1.2511
[11m 53s] iter 5500/10000 | loss 1.2561
[12m 53s] iter 6000/10000 | loss 1.2489
[14m 2s] iter 6500/10000 | loss 1.2206
[15m 13s] iter 7000/10000 | loss 1.2200
[16m 14s] iter 7500/10000 | loss 1.2333
[17m 15s] iter 8000/10000 | loss 1.2115
[18m 13s] iter 8500/10000 | loss 1.2353
[19m 13s] iter 9000/10000 | loss 1.2139
[20m 13s] iter 9500/10000 | loss 1.2244
[21m 13s] iter 10000/10000 | loss 1.2092
Training finished.

4.3 Loss Visualization

Show Code

plt.figure(figsize=(8, 4))
plt.plot(all_losses)
plt.title("Training Loss (sampled)")
plt.xlabel(f"Step (every {plot_every} iters)")
plt.ylabel("Loss")
plt.grid(True, alpha=0.3)
plt.show()

5. Model Inference & Export

Show Code

def generate(model, start_str='Once upon a time', predict_len=100, temperature=0.8):
    model.eval()  # Set to evaluation mode
    hidden = model.init_hidden()

    # Convert start string to tensor
    # We loop through the start string to "warm up" the hidden state
    for i in range(len(start_str) - 1):
        inp = line_to_tensor(start_str[i])
        _, hidden = model(inp, hidden)

    # The last character of start_str is our first actual input for generation
    inp = line_to_tensor(start_str[-1])
    predicted = start_str

    for p in range(predict_len):
        output, hidden = model(inp, hidden)

        # Sample from the network output
        # divide by temperature to control randomness (lower = more predictable)
        output_dist = output.data.view(-1).div(temperature).exp()

        # Select one index based on the probability distribution
        top_i = torch.multinomial(output_dist, 1)[0]

        # Convert index back to character
        char = index_to_char(top_i.item())
        predicted += char

        # Use the predicted character as the next input
        inp = line_to_tensor(char)

    return predicted


# Run the generation
print("--- Generated Text (Temp=0.8) ---")
print(generate(model, start_str="The", predict_len=400, temperature=0.8))

print("\n--- Generated Text (Temp=0.5 - More Conservative) ---")
print(generate(model, start_str="The", predict_len=400, temperature=0.5))

--- Generated Text (Temp=0.8) ---
The merchants-dance sure the sprime an flapping bead of began to reckates one can way brange when the Second cleeps son, for he would have gone heart range to
the Linnet; I have least had besides; of the Princess of swould not understaing his
will heart of friend at resure, and was she sang
exchank to where I will be a great shall
that the Student in the garden rather of commons, but I lovely sho

--- Generated Text (Temp=0.5 - More Conservative) ---
The sat the lips arms of the most of the lips a great shall in the spring to me any on the Rocket of the most scried the Prince of her friendship, and the red rose so she was fur chain, and when the Rocket; but not say of a large broar of the morations could be a basket of the Rocket of the workment than the sailed out of the world the window as he went the statue of a
red rose, for a great of himse

1. Data Preparation

1.2 Load Text Data

1.3 Build Vocabulary (Character-to-Index Mappings)

1.4 Prepare Training Tensors

2. Model Architecture

2.1 Define RNN Module

2.1.1 Check the output shape of each layer

3. Training Utilities

3.1 Random Training Example Generator

3.2 Training Step Function

4. Training Process

4.1 Hyperparameter Setup

4.2 Main Training Loop

4.3 Loss Visualization

5. Model Inference & Export

Comments